How To Generate Profiling Data
Andre Fischer and Carsten Driesner, July 11th , 2001. Last change on August 2nd , 2001.
The task that is to be solved is the generation of profiling data for a manually selected set of points in programs of the OpenOffice/Sun One Webtop family. Moreover this should be accomplished as automatically as possible, so that timings of different program versions can be compared to each other automatically.
The main reason for not using an existing tool like Truetime or Quantify is the huge amount of data generated by these programs. In general the developers of the different modules of the office know about the time critical parts of their code. Therefore only profiling informations about these code parts are of interest. Everything else only obscures the view to the relevant data.
With this in mind we use a different approach. Instead of blindly instrumenting---i.e. adding code that generates profiling data---all files, we concentrate of the areas of interest. The developers have to add once manually commands into their code that emits time stamps together with context information. From then on everything else goes automatically. If the those files are compiled with a certain preprocessor symbol defined and the office is run with a certain environment variable set, then profiling information is written to a log file (or, to be more precise, to one log file per process). These log files are then processed by a Perl script and are transfomed into a Calc document.
The details of this process are explained below. Note that this is a work in progress. Especially the transformation of the log files into a Calc document will have to be adapted to the needs of those who use those documents.
Instrumenting the source files
The header file <rtl/logfile.hxx> (guarded by the
_RTL_LOGFILE_HXX_) contains a set of
macros that can be used to emit time stamps at certain points in the
code. The reason for using macros instead of a more decent C++
construct is to provide a way of removing the code for writing
profiling data completely from the compiled executable and it's
libraries. The macros wrap calls to the class ::rtl::Logfile
(declared in <rtl/logfile.hxx>) and the function
rtl_logfile_trace (declared in <rtl/logfile.h>). They generate
actual code only when at compile time the preprocessor symbol TIMELOG
is defined. This is either accomplished by building a special office
version (which will be the normal Pro-version) or by setting an
ENVCDEFS to include
Depending on the shell you use
will do that (4NT).
The macros can be used in two ways:
RTL_LOGFILE_CONTEXT_AUTHOR(instance,project,author,name)creates a context instance that can be referenced by
instance. The three remaining parameters contain the name of the project in which the macro is used, the Sun id of the author beeing responsible for the code in which the macro is used (for example af119097 for one of the authors) and the
nameof the function or other scope that this context refers to. The context instance emits one time stamp when it is created and one when it is destroyed. Therefore, if placed as first statement of a function definition, a time stamp is written to the log file when the function is entered and one when it is exited even when there is more then one (implicit) return statement.
The name passed to the context instantiation is written with every time stamp that originates from such a context. This is also true for the following macros.
RTL_LOGFILE_CONTEXT_TRACE3(instance,message,arg1,arg2,arg3)write time stamps with arbitrary messages to the log file. The actual message is given by the string
message. Together with zero to three arguments it is passed to a printf style output function. That means that for every argument there has to be a suitable % format string in
message. In order to allow the log file be parsed by the existing Perl scripts message may contain newlines. Each message is prefixed with the context name.
These macros exist also in a context free version.
RTL_LOGFILE_TRACE_AUTHOR3(project,author,message,arg1,arg2,arg3)do the same thing as their context twins except that the messages are not prefixed with a context name. As you can see, the project and author have to given to each trace macro instead of just to the one creating a context. They are defined in <rtl/logfile.h> which is included from <rtl/logfile.hxx>. Therefore for only using the context free macros it is sufficient to include <rtl/logfile.h>.
For every introduced above there is an analogon without the
_AUTHOR suffix which do not accept the
author arguments. Because both of these are used
later in the post processing stage we discourage you to use these
other macro versions.
Depending on which of these two ways is used to generate a time stamp, they are named function/scope time stamps respectively message time stamps.
Creating profiling information
If you have instrumented your code like described above and have
compiled and installed it you can create profiling information by
starting the office with the bootsrap variable
set to a file name prefix. This prefix is completed by appending an
underscore, the process id and a .log suffix. Note that
backslashes have to be escaped by another backslash. The variable
RTL_LOGFILE can be set in one of the following ways:
Setting the environment variable
RTL_LOGFILE. This for instance be done with
from a 4NT shell.
Passing the argument
-env:RTL_LOGFILon the commandline to the office.
Put an entry into <executable>
rcfile (the first with, the second without a dot) where <executable> is the name of the executable. For more information on this method please refer to http://udk.openoffice.org/common/man/concept/uno_default_bootstrapping.html
Transforming the log files into Calc documents
Once you have created a log file you may want to convert it into a more readable form. There is a set of Perl scripts that create a Calc document from a log file that contains for every thread a set of pages with different views of the profiling data.
The first page shows a pretty printed version of the list of time stamps. They are indented according to the calling hierarchy. For each time stamp you can see the time it has been written (or to be more precise the time on which it's writing has been requested), if applicable the time the function/scope took to compute and the function/scope name and message.
The second page shows a list of all functions/scopes for which timing informations exist. Every list entry shows the function/scope name, total, minimal, maximal, and average time and the number of calls.
The list of time stamps can be filtered in order to reduce a large
amount of data to a managable size and to exclude profiling
information from projects you are not interested in. There are two
filters. One is for explicit inclusion of time stamps, the other for
explicit exclusion. Each is initialized from a file containing
regular expressions (Perl style) given on the command line. The file
-i switch defines those time stamps that are
to be included in the reports and the file after the
switch defines those time stamps that are to be excluded. The files
may contain empty lines or comment lines whose first character is an
'#'. An empty inclusion filter does have no effect. If both filters
are specified, then a time stamp is written to the report if it
matches at least one of the regular expression of the inclusion
filter and none of the exclusion filter.
The idea is to use the inclusion filter if you have just a small number of functions or messages you are interested in and to use the exclusion filter if there is only a small number you would not like in your reports.
An inclusion file might look like this.
# Regular expression matching all function names that shall
# in the reports.
# Show calls to BuildChart in the chart project.
The form and number of reports will of course have to be changed and extended. We are looking forward to your feedback.
Furthor documentation can be found in the C++ header files
and the Perl scripts in project
You can use for instance
pod2html to extract and
transform the Perl scripts documentation in HTML documents.
Documentation of the time stamp format can be found here.