Language

The Free and Open Productivity Suite
Released: Apache OpenOffice 4.1.15

last change: $Date: 2006/10/06 10:39:56 $ $Revision: 1.7 $

ConvWatch

ConvWatch is a test tool to compare documents not by it's content but by it's graphical representation. It's based on Caolán McNamara's convwatch tool written in Python. Due to some limitations ConvWatch has been completely rewritten in Java.

Topics

Dependencies

How ConvWatch work

ConvWatch loads a set of documents and

  1. saves them by the OpenOffice.org equivalents and compares them to a previous saved set of proved output.
  2. prints the files to PostScript documents and compares them visually to a previous saved set of proved PostScript output (e.g. output from Microsofts Office's print to file).

The saved tests ensures no regression against the reference output documents. The print tests determine if there are differences against the reference printer output (which could either be created by OpenOffice.org originally so as to test for layout regressions, or by Microsoft Office so as to compare conversion quality or by PDF export from OpenOffice.org)

To make the print tests valid all printing must happen with the same printer driver (and the same set of fonts, but we'll overlook that issue here). Some additional software is required to facilitate the comparison.

ConvWatch is a part of the qadevOOo environment. It's written in Java which is good for communication with OpenOffice.org. So install as follows...

Installing

Some software is required to run ConvWatch and have to be installed locally.

  1. For Windows environment a standarized printer driver is required
    to give a baseline reference. We don't need this in UNIX environments.
    Install Adobe Universal PostScript Microsoft Windows Driver from wingsteng.exe Originally download it from: http://www.adobe.com/support/downloads Get this crossoffice.ppd file also. It is only need within Microsoft Windows.

      Select option local printer, and use FILE: as the port.

      When it asks to select printer model, choose browse and go to your temp-directory and select crossoffice.ppd and then select Microsoft Office Generic printer. Select no to configuring printer when reaching the end of the install process.

      Note: This is the same .ppd that is the generic OpenOffice.org driver under Unix with a little modification to fix a resolution related problem. Once installed use this printer when you print to file from Microsoft Office. This allows the same printer driver to be used under Unix and Microsoft Windows. So in theory the same output should be created by Unix version of OpenOffice.org and Microsoft Windows version of OpenOffice.org. In practice different fonts are available on different platforms and even if the ttf fonts of Microsoft Windows are made available under Unix and type1 fonts disabled then our rendering is still different (I've tried a number of approaches). Making Unix and Microsoft Windows output equivalent is certainly something to be looked into in the future. Currently there are enough problems elsewhere to be worked out before tackling that.

      Optional make this printer your default Microsoft Windows printer with start->settings->printer, or remember it's name and set this name as a property like PRINTER_NAME=name into a props file.

  2. Ghostscript is required,
    get it from http://www.cs.wisc.edu/~ghost/ For Microsoft Windows only, install it to c:\gs so there are less problems. Set the environment variable path to c:\gs\gs8.00\bin. If ready, try gswin32c --help in a fresh command line shell if you get a usage message, installation of Ghostscript is done.

  3. ImageMagick is required
    to compare the images generated by Ghostscript from the printer output. Get it originally from: http://www.imagemagick.org/ ConvWatch only need composite and identify out of this package.

  4. Java is required,
    the communication between ConvWatch and an OpenOffice.org installation runs over Java UNO. So at least a JRE (1.4) is need. Get it from http://java.sun.com if not already installed.

  5. OpenOffice.org in non debug version is required.
    No extra features has to be activate. It's possible to use a debug version, but it's not really tested due to the much message boxes about assertions and other. Which will not close automatically and so it's not really usable for such automatically tests. WARNING: Don't install OpenOffice.org versions in directories with white spaces in the name, e.g. Program Files. This is not supported right yet.

  6. Within Microsoft Windows environment it's possible to say that ConvWatch could use Microsoft Office to create the output via printer driver, but therefore a Microsoft Office package is required and must installed.
    • Perl is required, if Microsoft Office is used. ConvWatch uses Perl to communicate between itself and Microsoft Office, so Perl must be installed. Get it from www.activeperl.com.
  7. OOoRunnerLight.jar is required,
    get it from the qadevOOo project. This jar contains the complex test environment which is need to run ConvWatch.

Complex test

ConvWatch is realized as a complex test. This complex test environment is be able to open/start an already installed OpenOffice.org, also close this opened OpenOffice.org and some more for automatically test runs.

Only one extra property file is need.

Property File

To control the ConvWatch process, there must exist a property file.

Lines starts with '#' are comments. Empty lines are allowed.

In this property file the information stored, where to find the documents, where to find the references and where have to go the results.

It also contains the call to a OpenOffice.org executable. The variables will describe here.


DOC_COMPARATOR_INPUT_PATH=/path/to/document-pool

DOC_COMPARATOR_REFERENCE_PATH=/path/to/references

DOC_COMPARATOR_OUTPUT_PATH=/path/to/convwatch_results

AppExecutionCommand=/opt/openoffice2/program/soffice -headless -norestore -nocrashreport -accept=socket,host=localhost,port=8100;urp;

Enhancements

Due to the fact, that ConvWatch works within Microsoft Windows or Unix environment, but there are different path variables to set, there exist a variable prefix. This prefix based on 7 characters, wntmsci, unxlngi, unxsols or unxsoli, but doesn't need to set. So instead of DOC_COMPARATOR_INPUT_PATH use unxlngi.DOC_COMPARATOR_INPUT_PATH or wntmsci.DOC_COMPARATOR_INPUT_PATH or unxsols.DOC_COMPARATOR_INPUT_PATH so there must only exist one property file for all environments.

Property Variables

As follows a description, which properties exists and how they work in the ConvWatch environment.

At the moment the variables must exist in the property file from the Java file, there are other ways to manipulate such variables like shell paramater or environment variables but this described method is to favour.

The first three variables are the most important, and must set, the other variables are optional.

DOC_COMPARATOR_INPUT_PATH

Set DOC_COMPARATOR_INPUT_PATH to the location of the original document file. ConvWatch will not really compare documents, but it's output. So for a original document there must also exist a reference document which must have the same name but with *.prn extension. If no reference exist, the test for this document failed.

DOC_COMPARATOR_OUTPUT_PATH
Set DOC_COMPARATOR_OUTPUT_PATH to a location, where the results will create. Warning, there must be enough disk space for this.

NEVER set DOC_COMPARATOR_INPUT_PATH to DOC_COMPARATOR_OUTPUT_PATH or a sub directory of it, you will lose samples.

AppExecutionCommand
Set AppExecutionCommand to the OpenOffice.org executable which will be involved by this test. The parameter for the OpenOffice.org is need to set the OpenOffice.org in it's listening mode. Find more about OpenOffice Listening mode in the OpenOffice Developers Guide in chapter UNO Concepts. e.g.: AppExecutionCommand=/opt/openoffice2/program/soffice -headless -accept=socket,host=localhost,port=8100;urp;

The parameter -headless is to start the office in background.

DOC_COMPARATOR_DIFF_PATH
Due to the fact that no program is perfect, there are always differences, we have to live with. But an automatically process can't really decide if we can live with such failure or not, it's possible to say where the old differences from an old run are. So it's possible to check if the new test is at least as good as an old test.

DOC_COMPARATOR_GFX_OUTPUT_DPI_RESOLUTION
Set DOC_COMPARATOR_GFX_OUTPUT_DPI_RESOLUTION to a value of dots per inch to tell Ghostscript, how big the JPEG resolution should be. Greater values consume much more memory and need much more compare time, but the JPEG picture will show a lot of more detailed information. A smaller value is much faster due to the fact that less pixels have to compare. For a first fast overview a value of 75 seems to be enough.

The default value is 212. So a DIN A4 document results in 1752x2478 pixel sized picture which consume round about 17MB memory.

IMPORTANT If Java fails with memory problems it could be that a special Java parameter has to set . Search the line where Java is called and insert -Xmx128m. Start again, maybe the value 128 isn't high enough. As default the parameter is not used.

DOC_COMPARATOR_INCLUDE_SUBDIRS
Set DOC_COMPARATOR_INCLUDE_SUBDIRS=no if ConvWatch should not scan recursive in given DOC_COMPARATOR_INPUT_PATH directories. The default is to run deep through all sub directories.

DOC_COMPARATOR_PRINTER_NAME
This parameter set the printer, which will use instead of the standard default printer. This parameter is only need and supported in Microsoft Windows environment. The default is to use the standard printer.

DOC_COMPARATOR_PRINT_MAX_PAGE
Set DOC_COMPARATOR_PRINT_MAX_PAGE to tell the office, how much pages are export as maximum. The default is to print all pages.

DOC_COMPARATOR_PRINT_ONLY_PAGE
Set DOC_COMPARATOR_PRINT_ONLY_PAGE to tell the office, which pages should be print. This is a string value, e.g. set to "1-4;24" to print page 1 to 4 and page 24. The default for this value is an empty string.

DOC_COMPARATOR_REFERENCE_PATH

Set DOC_COMPARATOR_REFERENCE_PATH to the location of the reference documents. If this variable isn't set ConvWatch assume the reference exists near the original document file. For this the behaviour the DOC_COMPARATOR_INPUT_PATH must be writable.

DOC_COMPARATOR_REFERENCE_CREATOR_TYPE
Normally ConvWatch uses the OpenOffice.org printer driver to create the postscript output.

Set DOC_COMPARATOR_REFERENCE_CREATOR_TYPE=pdf the internal PDF creator from OpenOffice.org is taken.
To test the internal PDF creator to the normal printer driver. Create the references without this parameter and the tests with DOC_COMPARATOR_REFERENCE_CREATOR_TYPE=pdf or vise versa.

Set DOC_COMPARATOR_REFERENCE_CREATOR_TYPE=msoffice not OpenOffice.org creates the output, but Microsoft Office. There must exist a runnable Microsoft Office on this PC and this works only in Microsoft Windows environment.
To test OpenOffice.org output within Microsoft Office output, create the references without this parameter and the tests with DOC_COMPARATOR_REFERENCE_CREATOR_TYPE=msoffice or vice versa.

The default, if not set is OOo for OpenOffice.org output.

ThreadTimeOut
qadevOOo runner is made for automatically tests, so there exist a possibility to kill endless running tests after a run out of time. This value is given in microseconds. ConvWatch consume very much run time for tests, so set the time out higher. e.g. ThreadTimeOut=3600000 for one hour.
DOC_COMPARATOR_OVERWRITE_REFERENCE
References most the time only needs to create once. So the normal behaviour is to no overwrite already existing reference files. Set this parameter to true and also already exist references will overwriten by new once. So it is possible to run the creation of references again and again.

DOC_COMPARATOR_GFXCMP_WITH_BORDERMOVE
With this parameter it is possible to create a second difference check with removed borders. Normally all documents have a print border, due to the fact that the Hardware isn't able to print to the hole paper, also it's not really a good idea to print as much as possible on one page. Most the time the borders are equal in documents, but sometimes there could be differences. To see better the differences of the content of the document, not only that there exist a move of the content, it is possible to remove the border of every document. Set this to 'yes' or 'true' and borders will remove by an simple border remove algorithm.

Start ConvWatch

Just a little tsch helper script, copy the OOoRunnerLight.jar out of the qadevOOo project, the other jar files, get out from an current OpenOffice.org program/classes directory.

#!/bin/tcsh 

setenv PTO /path/to/an/office/program

# path to open office classes
setenv PTOC ${PTO}/classes

# path to OOoRunnerLight.jar
setenv OOORUNNER /path/to/OOoRunnerLight

setenv JARFILES ${PTOC}/ridl.jar:${PTOC}/unoil.jar:${PTOC}/jurt.jar:${PTOC}/juh.jar:\
                ${PTOC}/jut.jar:${PTOC}/java_uno.jar:${OOORUNNER}/OOoRunnerLight.jar

# start reference build
java -cp ${JARFILES} org.openoffice.Runner -tb java_complex -ini propertyfile.ini -o convwatch.ReferenceBuilder

# start the graphical document compare
java -cp ${JARFILES} org.openoffice.Runner -tb java_complex -ini propertyfile.ini -o convwatch.ConvWatchStarter

Start the test by simply call this script.

Only set the path to the right places. With this script at first the references will build.

As second, the same office is taken to check the just now created references against the same office and the same documents.

OK, the result should never fail but should only demonstrate how ConvWatch works.

Now think about the way to only create references with an old OpenOffice.org 1.0.x and make a graphically compare to a new OpenOffice.org 2.0. not only with one document in the pool. But be aware, the run could take hours.

Results

Use a image viewer to examine the results which will be found in the DOC_COMPARATOR_OUTPUT_PATH of the property file. There exist at least 3 JPEG pictures and one PostScript *.ps file. All based on the document name and some appendix which are described as follows.

Name Creator Description
NAME.ps file printer This file is created with the original document found in DOC_COMPARATOR_INPUT_PATH and print to file method.
NAME.ps0001.jpg gs (Ghostscript) This file is created with the PostScript file which is created before by Ghostscript.
NAME.prn0001.jpg gs (Ghostscript) This file is created with the reference file found in DOC_COMPARATOR_REFERENCE_PATH by Ghostscript.
NAME.prn.diff0001.jpg composite This file is created with both above JPEG pictures with composite from ImageMagick. It show how the both files are differ.

It's possible that there exist much more files, for every document page at least the 3 JPEG pictures.

The ini result file in short, contains at least two sections, the global section

The page sections, for every page exist a own section.

Tips

If in the Unix environment the PostScript file is only create in black and white, create a new printer which produce color output and set it as the default printer. Now the output should came also in color.

Create References

It is possible to create the reference files *.prn automatically.

The references have to build only once and rebuild only if there are changes like a new office major version.

Java API

There exist also a possibility to use ConvWatch direct from a Java environment.

But, be aware, this is absolutly alpha state, not tested right and in a strong changeable way. To take a look, get qadevOOo project, goto directory runner/convwatch open the Java file GraphicalDifferenceTest.java and read it's comments. There will be create a javadoc documentation about the API in the near future.

If there are problems/ideas with running ConvWatch don't hesitate to contact the current game keeper.

Thanks

Apache Software Foundation

Copyright & License | Privacy | Contact Us | Donate | Thanks

Apache, OpenOffice, OpenOffice.org and the seagull logo are registered trademarks of The Apache Software Foundation. The Apache feather logo is a trademark of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.