Issue 22253 - FreeBSD startup problem: GetStorage and non-existent soffice.cfg
Summary: FreeBSD startup problem: GetStorage and non-existent soffice.cfg
Status: CLOSED NOT_AN_OOO_ISSUE
Alias: None
Product: porting
Classification: Code
Component: code (show other issues)
Version: OOo 3.1
Hardware: PC FreeBSD
: P3 Trivial (vote)
Target Milestone: ---
Assignee: pavel
QA Contact: issues@porting
URL: http://tmp.janik.cz/freebsd-getstorag...
Keywords:
: 82690 98781 (view as issue list)
Depends on:
Blocks: 18060
  Show dependency tree
 
Reported: 2003-11-07 18:34 UTC by pavel
Modified: 2009-06-16 13:14 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Workaround, not for integration! (509 bytes, patch)
2003-11-08 08:48 UTC, pavel
no flags Details | Diff
better addsym.awk for FreeBSD (3.16 KB, text/plain)
2003-11-13 16:29 UTC, Daniel Boelzle [:dbo]
no flags Details
use RTLD_DEFAULT (693 bytes, patch)
2003-11-30 09:16 UTC, mrauch-openoffice
no flags Details | Diff
same patch for FreeBSD (1.55 KB, patch)
2004-01-10 14:52 UTC, maho.nakata
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description pavel 2003-11-07 18:34:54 UTC
When OOo fix1 starts on FreeBSD, an error message is shown and OOo exits.

The error message is at http://tmp.janik.cz/freebsd-getstorage-error.png
Comment 1 pavel 2003-11-07 18:36:21 UTC
I will work on it.
Comment 2 pavel 2003-11-08 08:47:20 UTC
Attched patch is a simple workaround (warning: this is not meant as a
final solution!) for this problem.
Comment 3 pavel 2003-11-08 08:48:04 UTC
Created attachment 11054 [details]
Workaround, not for integration!
Comment 4 pavel 2003-11-09 16:24:11 UTC
Daniel, do you have an idea what should I try next?
I added you to CC:.
Comment 5 pavel 2003-11-11 20:06:22 UTC
We have found out that the exception itself is thrown perfectly, but
there seems to be an issue with linking accross module boundaries :-(

This is something I can not solve myself :-(
Comment 6 pavel 2003-11-11 21:38:11 UTC
I'm going to test the whole thing on older FreeBSD system to test if
the problem is in FreeBSD 4.9 too (I use 5.1 right now).
Comment 7 Daniel Boelzle [:dbo] 2003-11-12 09:22:23 UTC
@Pavel: I assume this being a linking problem the way that two
different addresses are used for the same symbol at runtime.  The one
that the catch handler tests against and the one the libgcc3_uno.so
bridge uses calling dlsym( app_handle, "<RTTI-name>".  I assume those
two being different.  So it is upt to you to test whether there are
different addresses resolved at runtime.  Dump out the dlsym'ed one as
well as the one that is resolved by the ld.so using LD_DEBUG
environment variable.  "setenv LD_DEBUG help" to get all options.

Comment 8 pavel 2003-11-13 10:59:52 UTC
ld.so on FreeBSD only supports
LD_LIBRARY_PATH, LD_PRELOAD and LD_BIND_NOW and severeal others.
LD_DEBUG is not supported. Thus I added some debugging manually to
bridges/source/cpp_uno/gcc3_freebsd_intel/except.cxx near dlsym:

        OString symName( buf.makeStringAndClear() );
        rtti = (type_info *)dlsym( m_hApp, symName.getStr() );   

        fprintf(stderr, "PJ: %s %p (%s)\n", symName.getStr(), rtti,
dlerror());

When I run unchanged OOo from fix1, I got:

PJ: _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE 0x0
(Undefined symbol
"_ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE")
PJ: _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE 0x28998334 ((null))
pavel@irtos:~/OpenOffice.org1.1.0> 

Ie. dlsym can not find the symbol
_ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE

But:

libsfx645fi.so:
003860d8 V _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
--
libucpfile1.so:
00050f2c V _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE

On GNU/Linux, the same debug fprintf prints:

PJ: _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
0x42c1a314 ((null))

ie it found the symbol.
Comment 9 Daniel Boelzle [:dbo] 2003-11-13 11:19:06 UTC
@Pavel: Good work!  this makes sense.  Nevertheless it is somehow
strange, that some symbols are found and others not.

Pavel, please try to get the application handle another way,

    : m_hApp( dlopen( 0, RTLD_LAZY ) )

change to

    : m_hApp( dlopen( 0, RTLD_NOW | RTLD_GLOBAL ) )

A second try may be to find out differences between the
InteractiveIOException (which works) and the
InteractiveAugmentedIOException (which does not work) symbols.  Maybe
a nm -D *.so shows differences.  You can also use the gnutools objdump
-T *.so which includes versioning info.
Comment 10 pavel 2003-11-13 15:19:04 UTC
I did the first change:

PJ: _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE 0x0
(Invalid shared object handle 0x660061)
PJ: _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE 0x0 (Invalid
shared object handle 0x660061)
PJ: _ZTIN3com3sun4star4task28ClassifiedInteractionRequestE 0x0
(Invalid shared object handle 0x660061)
PJ: _ZTIN3com3sun4star3uno9ExceptionE 0x0 (Invalid shared object
handle 0x660061)
crash_report: not found


Fatal exception: Signal 6
Stack:
Abort trap (core dumped)

nm output for both symbols:

pavel@irtos:~/OpenOffice.org1.1.0> nm -D program/* 2>/dev/null|egrep
"_ZTIN3com3sun4star3ucb31Interactiv
eAugmentedIOExceptionE|:"|grep -B1
_ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
program/libsfx645fi.so:
003860d8 V _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
--
program/libucpfile1.so:
00050f2c V _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
pavel@irtos:~/OpenOffice.org1.1.0> nm -D program/* 2>/dev/null|egrep
"_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE|:"|grep -B1
_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
program/libfileacc.so:
0000ba28 V _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
--
program/libsfx645fi.so:
00376854 V _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
--
program/libsot645fi.so:
00044240 V _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
--
program/libucpfile1.so:
00050f50 V _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
--
program/libutl645fi.so:
0007b334 V _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
pavel@irtos:~/OpenOffice.org1.1.0> 

objdump output:
pavel@irtos:~/OpenOffice.org1.1.0> objdump -T program/*
2>/dev/null|grep _ZTIN3com3sun4star3ucb22Interac
tiveIOExceptionE
0000ba28  w   DO .data  0000000c  UDK_3_0_0  
_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
00376854  w   DO .data  0000000c  Base       
_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
00044240  w   DO .data  0000000c  Base       
_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
00050f50  w   DO .data  0000000c  UDK_3_0_0  
_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
0007b334  w   DO .data  0000000c  Base       
_ZTIN3com3sun4star3ucb22InteractiveIOExceptionE
pavel@irtos:~/OpenOffice.org1.1.0> objdump -T program/*
2>/dev/null|grep _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
003860d8  w   DO .data  0000000c  Base       
_ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
00050f2c  w   DO .data  0000000c  UDK_3_0_0  
_ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE
pavel@irtos:~/OpenOffice.org1.1.0> 

When I did much more deep inspection, I found that there is a small
difference in _end symbols and this ring the bell for me. We already
met with (probably) similar issue - missing _end in the map file.

Those files do not contain _end on GNU/Linux, but contain it on FreeBSD:

libcppuhelper3gcc3.so
libcppuhelpergcc3.so
libcppuhelpergcc3.so.3
libcppuhelpergcc3.so.3.1.0

This is because of our patch
ftp://ftp.linux.cz/pub/localization/OpenOffice.org/devel/build/Patches/OOo_1.1.0_source-FreeBSD-temp-add_end.diff

This file does not contain _end symbol on GNU/Linux but contains it on
FreeBSD:

libucpfile1.so

Maybe the last file?

I put all both dumps of objdump on GNU/Linux and also on Solaris to

http://tmp.janik.cz/objdump/objdump-FreeBSD.log.gz
http://tmp.janik.cz/objdump/objdump-GNU_Linux.log.gz
(quite long files)

Grep for those exceptions is in

http://tmp.janik.cz/objdump/objdump.exceptions

Looks similar ;-)
Comment 11 Daniel Boelzle [:dbo] 2003-11-13 16:29:19 UTC
Created attachment 11240 [details]
better addsym.awk for FreeBSD
Comment 12 Daniel Boelzle [:dbo] 2003-11-13 16:30:30 UTC
@Pavel: I added a better addsym.awk version as attachement.  Please
relink your libs with this one.  Just a try...
Seems to be a strange problem, but maybe it's the _end problem.
Comment 13 pavel 2003-11-14 08:19:26 UTC
I did a full clean rebuild of the tree with the new script and:

pavel@irtos:~/OpenOffice.org1.1.0> ./soffice 
PJ: _ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE 0x0
(Undefined symbol
"_ZTIN3com3sun4star3ucb31InteractiveAugmentedIOExceptionE")
PJ: _ZTIN3com3sun4star3ucb22InteractiveIOExceptionE 0x28998334 ((null))
pavel@irtos:~/OpenOffice.org1.1.0> 

Ie. the same :-)

I have started the same build on older FreeBSD system (4.9, I use 5.1
now) so we can compare those two versions of FreeBSD. Maybe this
brings some new info.
Comment 14 mrauch-openoffice 2003-11-17 22:42:45 UTC
I think I've found the problem:
It seems to be a bug/feature in the *BSDs dynamic linker ld.elf_so:
(The code isn't exactly the same on FreeBSD and NetBSD, but both are
derived from the same original implementation, and most improvements
in one project have also been added in the other one.)
When a dlsym() search on the main program is performed (as the bridge
does), only the shared libraries loaded at start time are searched,
but not the ones opened via dlopen().
The symbol ...InteractiveAugmentedIOException is defined only in 
libsfx645bi.so and libucpfile1.so, which are both loaded via dlopen()
calls, so the symbol won't be found. The symbol
...InteractiveIOException however is also defined in libutl645bi.so,
which soffice.bin is linked against, so this one will be found.
Comment 15 pavel 2003-11-18 07:34:41 UTC
Great investigation!

What we will do with this? Should I ask FreeBSD specialists?

BTW - the same error is present on FreeBSD 4.9.
Comment 16 pavel 2003-11-18 07:42:34 UTC
BTW - I just tried to check your wording and tested to call dlsym on
dlopened library as from GNU/Linux's dlopen manual page:

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int main(int argc, char **argv) {
  void *handle;
  double (*cosine)(double);
  char *error;

  handle = dlopen ("libm.so", RTLD_LAZY);
  if (!handle) {
    fprintf (stderr, "%s\n", dlerror());
    exit(1);
  }

  cosine = dlsym(handle, "cos");
  if ((error = dlerror()) != NULL)  {
    fprintf (stderr, "%s\n", error);
    exit(1);
  }

  printf ("%f\n", (*cosine)(2.0));
  dlclose(handle);
  return 0;
}

And it works (FreeBSD 4.9):

pavel@leda:~> ./a.out 
-0.416147

So?
Comment 17 mrauch-openoffice 2003-11-18 09:41:28 UTC
You missed a small, but very important detail of my statement:
"dlsym() search _on_the_main_program_" i.e. the handle is not the
dlopen()ed shared library itself, but the main program (which you get
back with dlopen(NULL, RTLD_LAZY) )
I've modified your test program a bit:

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int main(int argc, char **argv) {
  void *handle, *handlemain;
  double (*cosine)(double);
  char *error;

  handle = dlopen ("libm.so", RTLD_LAZY|RTLD_GLOBAL);
  if (!handle) {
    fprintf (stderr, "%s\n", dlerror());
    exit(1);
  }

  handlemain = dlopen(NULL, RTLD_LAZY);
  cosine = dlsym(handlemain, "cos");
  if ((error = dlerror()) != NULL)  {
    fprintf (stderr, "%s\n", error);
    exit(1);
  }

  printf ("%f\n", (*cosine)(2.0));
  dlclose(handle);
  return 0;
}

On Linux:
bash-2.05b$ ./a.out 
-0.416147

while on NetBSD:
-bash-2.05b$ ./a.out 
Undefined symbol "cos"

I've just carefully reread the NetBSD dlsym manpage and it states that
this kind of usage is currently not supported. Ultimately that feature
should be added to ld.so_elf, but as it currently isn't I guess we
have to find a workaround. Maybe simulate the feature by hand by
remembering all dlopen()ed librarys and searching them as well if the
symbol can't be found. If we are lucky, there is only one dlopen() in
the whole OOo, namely the one in sal/osl/unx/module.c .
Comment 18 Daniel Boelzle [:dbo] 2003-11-18 10:49:43 UTC
@mrauch: well done!
We can workround this, because the UNO shared lib component loader
loads libraries using osl_loadModule(), so this ought to workarounded
as you suggested:
- using ols_loadModule(), osl_getSymbol() within bridge code
- in sal osl_loadModule(): (#if defined BSD) tag the application
handle (oslModule) and add all opened handles to static list
- in sal osl_getSymbol(): (#if defined BSD) extend search on
application handle to all dlopen()ed modules

One concern I still have:
componentA.uno.so: V _ZTINMyException
componentB.uno.so: V _ZTINMyException

Both are loaded, first A then B:
When componentB.uno.so has a catch handler (e.g. catch (MyException
&)), which symbol does it resolve?  From A or hopefully from B?
The problem then occurs, when the bridge throws an exception using the
symbol from componentA.uno.so, because it is loaded first, but
componentB.uno.so' catch handler expects its own.
Comment 19 pavel 2003-11-19 18:30:20 UTC
A friend of mine, Rudolf Cejka, gave me some hint to test:

FreeBSD:

pavel@leda:~> gcc -o dltest dltest.c
pavel@leda:~> gcc -o dltestnew dltestnew.c
pavel@leda:~> ./dltest
Undefined symbol "cos"
pavel@leda:~> ./dltestnew 
-0.416147
pavel@leda:~> diff -u dltest.c dltestnew.c
--- dltest.c    Wed Nov 19 18:57:08 2003
+++ dltestnew.c Wed Nov 19 18:57:37 2003
@@ -14,7 +14,7 @@
   }
 
   handlemain = dlopen(NULL, RTLD_LAZY);
-  cosine = dlsym(handlemain, "cos");
+  cosine = dlsym(RTLD_DEFAULT, "cos");
   if ((error = dlerror()) != NULL)  {
     fprintf (stderr, "%s\n", error);
     exit(1);
pavel@leda:~> 

RTLD_DEFAULT is described as:

    If dlsym() is called with the special handle RTLD_DEFAULT, the
search for
     the symbol follows the algorithm used for resolving undefined symbols
     when objects are loaded.  The objects searched are as follows, in the
     given order:

     1.   The referencing object itself (or the object from which the
call to
          dlsym() is made), if that object was linked using the -Wsymbolic
          option to ld(1).

     2.   All objects loaded at program start-up.

     3.   All objects loaded via dlopen() which are in needed-object
DAGs that
          also contain the referencing object.

     4.   All objects loaded via dlopen() with the RTLD_GLOBAL flag
set in the
          mode argument.

Can we use it on *BSD only?

@mrauch: could you please test it, I just managed to remove the whole
build tree and started from scratch to test something else.

RTLD_DEFAULT is defined on Linux only with __USE_GNU.

On Linux:

pavel@pavel:/tmp> gcc -o dltest dltest.c -ldl
pavel@pavel:/tmp> gcc -o dltestnew dltestnew.c -ldl
pavel@pavel:/tmp> ./dltest
-0.416147
pavel@pavel:/tmp> ./dltestnew 
-0.416147
Comment 20 mrauch-openoffice 2003-11-23 14:53:33 UTC
RTLD_DEFAULT unfortunately solves the problem only partly:
The bridge is then able to find the symbol, but OOo nevertheless
crashes with the same error box.
This seems to be because of the fear Daniel already mentioned:
The dlsym() happens to find the symbol in libA.so first, but the catch
handler is in libB.so, so the exception still propagates further up
the stack.
I've then tried to only search the relevant library
(i.e. put "libsfx645bi.so" instead of the 0 in dlopen in except.cxx,
this works because this is the only place where the bridge is needed
during startup), and then OOo starts without further problems.
Does anyone know how the linker manages to find the right symbol in
Linux? By merging the different vtables into one?
I'll have a closer look at ld.elf_so, if I can convince it to treat
dlopen()ed libraries more similar to ones loaded as dependency.
Comment 21 mrauch-openoffice 2003-11-29 21:47:10 UTC
After more debugging:
On FreeBSD just using RTLD_DEFAULT should suffice.
(@pjanik: Are you able to test this?)
On NetBSD there is some slightly different behaviour in the dynamic
runtime linker ld.elf_so. As this also makes exception handling in
regcomp fail in some cases where the bridge isn't involved (it's pure
C++ code), I suspect it's a bug, but I have to see what the NetBSD
toolchain experts say. Anyway, changing the behaviour to the FreeBSD
one makes the problems disappear for me.
I'll now run a full build from scratch over night to check.
Comment 22 mrauch-openoffice 2003-11-30 09:16:24 UTC
Created attachment 11641 [details]
use RTLD_DEFAULT
Comment 23 mrauch-openoffice 2003-11-30 09:22:00 UTC
A full build confirmed that it works for NetBSD.
I've attached the appropriate patch for FreeBSD to this issue.
Comment 24 foskey 2003-11-30 09:54:52 UTC
This is freebsd / intel specific.  Will not afefct other platforms.

From that point of view approved.
Comment 25 pavel 2003-11-30 11:12:41 UTC
I'm going to compile with this patch and will report results.
Comment 26 pavel 2003-11-30 19:36:46 UTC
After applying the attached patch (freebsd_bridges.patch):

pavel@leda:~/OpenOffice.org1.1.1> ./soffice 
crash_report: not found


Fatal exception: Signal 6
Stack:
Abort trap (core dumped)

This is the first start of OOo after ./install --single

Thus this patch can not be used :-(

BTW  really interesting:

pavel@leda:~/OpenOffice.org1.1.1> gdb program/soffice.bin soffice.bin.core

...

Segmentation fault (core dumped)

Everything on FreeBSD 4.9-RELEASE
Comment 27 mrauch-openoffice 2003-12-29 17:27:25 UTC
The bug is now fixed in NetBSD's dynamic linker (actually already since Dec 7).
For FreeBSD I currently have no clue what else could be going wrong, sorry.
Comment 29 maho.nakata 2004-01-10 14:52:57 UTC
Created attachment 12404 [details]
same patch for FreeBSD
Comment 30 maho.nakata 2004-01-10 15:17:33 UTC
sorry, ignore my patch.
Comment 31 maho.nakata 2004-01-20 17:33:29 UTC
How to activate LD_DEBUG
add following line in /etc/make.conf
CFLAGS+=   -DDEBUG
and recompile rtld.
# cd /usr/src/libexec/rtld-elf/ ; make clean ; make depend ; make ; make install

(martin told me)
Comment 32 maho.nakata 2004-02-08 12:43:02 UTC
Hi,
I asked this issue from google and find a simular one:
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=5890

both NetBSD and FreeBSD should have this issue, since:
/usr/src/libexec/rtld-elf/rtld.c in FreeBSD
(http://www.freebsd.org/cgi/cvsweb.cgi/src/libexec/rtld-elf/rtld.c)
            /*
             * XXX - This isn't correct.  The search should include the whole
             * DAG rooted at the given object.
             */
          def = symlook_obj(name, hash, obj, false);
          defobj = obj;
        }
    }
And NetBSD have also such kind of implimentation
http://cvsweb.netbsd.org/bsdweb.cgi/src/libexec/ld.elf_so/rtld.c
.
In PR #5890 it seems to be difficult impliment correctly...
--
     /*
      * FIXME - This isn't correct.  The search should include the whole
      * DAG rooted at the given object.
      */

 
 which indicates that the correct implemtenation for ELF is to
 search both the shared-object specified by the handle passed to
 dlsym() and any shared objects loaded as a result of loading the
 object specified by the handle.  (the latter is what the Solaris
 manpage asys and what the Solaris implementation does.)

 
 Applying the patch in 5890 could cause programs that
 rely on the Solaris-style behaviour -- which currently work,
 due to the too-liberal search -- to fail.

 
 Michael Hitch notes that
     "The current way ld.elf_so tracks shared objects would
      make this type of search somewhat difficult".
--
Comment 33 pavel 2004-06-04 20:39:48 UTC
marking as solved, proper fix is unknown now.
Comment 34 maho.nakata 2009-06-16 13:03:56 UTC
*** Issue 98781 has been marked as a duplicate of this issue. ***
Comment 35 maho.nakata 2009-06-16 13:06:52 UTC
*** Issue 82690 has been marked as a duplicate of this issue. ***
Comment 36 maho.nakata 2009-06-16 13:09:10 UTC
reassign to maho as this is a FreeBSD issue.
Comment 37 maho.nakata 2009-06-16 13:10:34 UTC
.
Comment 38 maho.nakata 2009-06-16 13:14:22 UTC
see discussion
http://docs.FreeBSD.org/cgi/mid.cgi?20090614.081457.193757375.chat95

Konstantin replies: http://docs.FreeBSD.org/cgi/mid.cgi?20090614094141.GF23592

maho checked his patch and it works: http://docs.FreeBSD.org/cgi/mid.cgi?
20090615.054654.71139727.chat95
.
Konstantin and Alexander again posted a patch to rtld-elf.
maho checked their patch and verified that it works.

so this is a FreeBSD userland issue.