Language

The Free and Open Productivity Suite
Released: Apache OpenOffice 4.1.15

How To Generate Profiling Data

Andre Fischer and Carsten Driesner, July 11th , 2001. Last change on August 2nd , 2001.

Introduction

The task that is to be solved is the generation of profiling data for a manually selected set of points in programs of the OpenOffice/Sun One Webtop family. Moreover this should be accomplished as automatically as possible, so that timings of different program versions can be compared to each other automatically.

The main reason for not using an existing tool like Truetime or Quantify is the huge amount of data generated by these programs. In general the developers of the different modules of the office know about the time critical parts of their code. Therefore only profiling informations about these code parts are of interest. Everything else only obscures the view to the relevant data.

With this in mind we use a different approach. Instead of blindly instrumenting---i.e. adding code that generates profiling data---all files, we concentrate of the areas of interest. The developers have to add once manually commands into their code that emits time stamps together with context information. From then on everything else goes automatically. If the those files are compiled with a certain preprocessor symbol defined and the office is run with a certain environment variable set, then profiling information is written to a log file (or, to be more precise, to one log file per process). These log files are then processed by a Perl script and are transfomed into a Calc document.

The details of this process are explained below. Note that this is a work in progress. Especially the transformation of the log files into a Calc document will have to be adapted to the needs of those who use those documents.

Instrumenting the source files

The header file <rtl/logfile.hxx> (guarded by the preprocessor symbol _RTL_LOGFILE_HXX_) contains a set of macros that can be used to emit time stamps at certain points in the code. The reason for using macros instead of a more decent C++ construct is to provide a way of removing the code for writing profiling data completely from the compiled executable and it's libraries. The macros wrap calls to the class ::rtl::Logfile (declared in <rtl/logfile.hxx>) and the function rtl_logfile_trace (declared in <rtl/logfile.h>). They generate actual code only when at compile time the preprocessor symbol TIMELOG is defined. This is either accomplished by building a special office version (which will be the normal Pro-version) or by setting an environment variable ENVCDEFS to include -DTIMELOG. Depending on the shell you use

set ENVCDEFS=-DTIMELOG

will do that (4NT).

The macros can be used in two ways:

  1. RTL_LOGFILE_CONTEXT_AUTHOR(instance,project,author,name) creates a context instance that can be referenced by instance. The three remaining parameters contain the name of the project in which the macro is used, the Sun id of the author beeing responsible for the code in which the macro is used (for example af119097 for one of the authors) and the name of the function or other scope that this context refers to. The context instance emits one time stamp when it is created and one when it is destroyed. Therefore, if placed as first statement of a function definition, a time stamp is written to the log file when the function is entered and one when it is exited even when there is more then one (implicit) return statement.

    The name passed to the context instantiation is written with every time stamp that originates from such a context. This is also true for the following macros.

  2. RTL_LOGFILE_CONTEXT_TRACE(instance,message),

    RTL_LOGFILE_CONTEXT_TRACE1(instance,message,arg1),

    RTL_LOGFILE_CONTEXT_TRACE2(instance,message,arg1,arg2), and

    RTL_LOGFILE_CONTEXT_TRACE3(instance,message,arg1,arg2,arg3) write time stamps with arbitrary messages to the log file. The actual message is given by the string message. Together with zero to three arguments it is passed to a printf style output function. That means that for every argument there has to be a suitable % format string in message. In order to allow the log file be parsed by the existing Perl scripts message may contain newlines. Each message is prefixed with the context name.

    These macros exist also in a context free version.

    RTL_LOGFILE_TRACE_AUTHOR(project,author,message),

    RTL_LOGFILE_TRACE_AUTHOR1(project,author,message,arg1),

    RTL_LOGFILE_TRACE_AUTHOR2(project,author,message,arg1,arg2), and

    RTL_LOGFILE_TRACE_AUTHOR3(project,author,message,arg1,arg2,arg3) do the same thing as their context twins except that the messages are not prefixed with a context name. As you can see, the project and author have to given to each trace macro instead of just to the one creating a context. They are defined in <rtl/logfile.h> which is included from <rtl/logfile.hxx>. Therefore for only using the context free macros it is sufficient to include <rtl/logfile.h>.

For every introduced above there is an analogon without the _AUTHOR suffix which do not accept the project and author arguments. Because both of these are used later in the post processing stage we discourage you to use these other macro versions.

Depending on which of these two ways is used to generate a time stamp, they are named function/scope time stamps respectively message time stamps.

Creating profiling information

If you have instrumented your code like described above and have compiled and installed it you can create profiling information by starting the office with the bootsrap variable RTL_LOGFILE set to a file name prefix. This prefix is completed by appending an underscore, the process id and a “.log” suffix. Note that backslashes have to be escaped by another backslash. The variable RTL_LOGFILE can be set in one of the following ways:

Transforming the log files into Calc documents

Once you have created a log file you may want to convert it into a more readable form. There is a set of Perl scripts that create a Calc document from a log file that contains for every thread a set of pages with different views of the profiling data.

The first page shows a pretty printed version of the list of time stamps. They are indented according to the calling hierarchy. For each time stamp you can see the time it has been written (or to be more precise the time on which it's writing has been requested), if applicable the time the function/scope took to compute and the function/scope name and message.

The second page shows a list of all functions/scopes for which timing informations exist. Every list entry shows the function/scope name, total, minimal, maximal, and average time and the number of calls.

Filtering

The list of time stamps can be filtered in order to reduce a large amount of data to a managable size and to exclude profiling information from projects you are not interested in. There are two filters. One is for explicit inclusion of time stamps, the other for explicit exclusion. Each is initialized from a file containing regular expressions (Perl style) given on the command line. The file after the -i switch defines those time stamps that are to be included in the reports and the file after the -e switch defines those time stamps that are to be excluded. The files may contain empty lines or comment lines whose first character is an '#'. An empty inclusion filter does have no effect. If both filters are specified, then a time stamp is written to the report if it matches at least one of the regular expression of the inclusion filter and none of the exclusion filter.

The idea is to use the inclusion filter if you have just a small number of functions or messages you are interested in and to use the exclusion filter if there is only a small number you would not like in your reports.

An inclusion file might look like this.

# Regular expression matching all function names that shall show up

# in the reports.

# Show calls to BuildChart in the chart project.

BuildChart

Outlook

The form and number of reports will of course have to be changed and extended. We are looking forward to your feedback.

Further Documentation

Furthor documentation can be found in the C++ header files <rtl/logfile.h> and <rtl/logfile.hxx> and the Perl scripts in project tools/contrib/profiling. You can use for instance pod2html to extract and transform the Perl scripts documentation in HTML documents.

Documentation of the time stamp format can be found here.

Contact

You can contact the authors under andre.w.fischer@sun.com or carsten.driesner@sun.com.



Apache Software Foundation

Copyright & License | Privacy | Contact Us | Donate | Thanks

Apache, OpenOffice, OpenOffice.org and the seagull logo are registered trademarks of The Apache Software Foundation. The Apache feather logo is a trademark of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.