Apache OpenOffice (AOO) Bugzilla – Issue 25416
Bug in Encoding of Hebrew letters in filename
Last modified: 2005-08-24 07:05:33 UTC
In TextEdit, if we save a file with Hebrew letters in the name, the Hebrew letters are encoded as two-byte characters. The first byte is the hex number D7, and the second one is different for each character. In OOo, things are different. OOo has its own file URL's, and in this file URL, the encoding for the Hebrew letters is the same as in TextEdit, namely 2-byte characters, the first of which is D7. So far so good. However, before the file actually gets written, OOo converts this URL to a pathname for the file system, and in the process, changes the encoding. The Hebrew letters now become one-byte chars, and that byte is not identical with the second byte of the other encoding. As a result, I'm unable to save a Hebrew file. I get an error message the the path is not found. Example: The file URL : file:///Users/oleg/Documents/%D7%90%D7%9C%D7%9F.sxw gets converted to: /Users/oleg/Documents/\0xd0\0xdc\0xdf.sxw where \0xd0\0xdc\0xdf are three characters with the hex values d0, dc,df. For now, we put in a kludge. Namely, in osl_openFile (src/sal/osl/unx/file.c) before calling open(buffer, flags, mode), we check buffer for Hebrew letters, and convert them back to the two-byte encoding. This allows the file to be saved. However, we still get the error message "<filename> not found", probably because this conversion has to take place in other parts of the code as well. How do we really fix this bug?
mh->ayaniger: is this a MacOSX problem only ? reassigned.
See also discussion (Alan Yaniger) on openoffice.porting.dev from 02/17/2004 for additional information
Attached is a file with patches for *.c and *.cxx files in sal/osl/unx. There is also a patch for sal/inc/osl/thread.h
Created attachment 13553 [details] Patches for sal/inc/osl/thread.h, sal/osl/unx/*.c, and sal/osl/unx/*.cxx
Interestingly I got a separate EMail today from Boris Reznik in Israel claiming that while my "Start OpenOffice.org" launcher did not support opening files with Hebrew letters in the name, the "CoooL" launcher did work. However Boris didn't state what version of OOo he was using - I have to suspect OOo 1.0.3GM. If OOo requires source changes to support Hebrew filenames, then I can't see how "CoooL" would be working. However it does seem this bug report is discussing SAVE rather than OPEN.
Because of limited resources for OOo1.1.2 we decided to shift this task to OOo2.0.
Please have a look at #i28928# Kind Regards, Tino
Hi Tino, I looked at the issue 28928, and it wasn't obvious to me how it was relevant to this issue. Could you explain more fully? Thanks, Alan
Hi Alan, well sal converts file names which it gets from the system to UTF8. Because no encoding is linked to such a system file name sal uses the current thread text encoding for the conversion. But some systems always use a specific encoding for file names (UTF8 for instance as in the current case), in this case the sal conversion fails as we saw. On the other hand we cannot patch osl_getThreadTextEncoding to always deliver the encoding used at the file system interface as this function is even used in cases which have nothing to do with the aforementioned issues and where we want indeed the current thread text encoding. That's why the proposal to introduce a pair of new functions to set the encoding which will be used for file name to file url conversion. In the concrete case this function would return UTF8 for instance. HTH, Tino
Tino, you probably wanted to mention issue 28982 instead of 28928 :-)
*** Issue 16281 has been marked as a duplicate of this issue. ***
Well, if a Linux bug was marked as a dup of a Mac bug, then the PLATFORM and OS need to be changed to ALL
*** Issue 29224 has been marked as a duplicate of this issue. ***
I agree with Tino's point that changing how osl_getThreadTextEncoding() works will cause other things to break. The better solution (and one that I have been using in released versions NeoOffice/J) is to #define osl_getThreadTextEncoding() RTL_TEXTENCODING_UTF8 when MACOSX is defined in the following sal/osl/unx files: file.c module.c pipe.c process.c process_impl.cxx profile.c security.c tempfile.c uunxapi.cxx
Hi *, a more concrete proposal: We would like to introduce two new functions in sal osl_setFileSystemEncoding osl_getFileSystemEncoding These functions deliver the encoding which should be used for encoding/decoding system paths to or from file urls. For platforms which are using a fixed encoding these function could well deliver the required encoding while on other systems the functions could just call osl_getThreadTextEncoding to get an encoding. In the desktop project there is some code which detects specific desktop environments like Gnome, etc. this would be a good place to set the to be used file system encoding if necessary. Hopefully I fix this bug before OOo 2.0 beta. I will propose this sal extension on openoffice.interface-discuss too.
Hi Alan, I played a little bit with a Mac (though my Mac knowledge is very limited) in order to investigate the problem with regards to this bug. To me it seems that the problem has something to do with a "misconfigured" system. It would be nice if some Mac guru's could verify my findings and maybe suggest some fixes which might be more appropriate than the suggested fix to overwrite osl_getThreadTextEncoding in the osl file system interface. It is known that osl uses osl_getThreadTextEncoding in order to get an encoding used for converting system paths to file URLs and vice versa. osl_getThreadTextEncoding will be initialized by a function osl_getProcessLocale which calls a fuunction _imp_getProcessLocale (see osl/unx/nlsupport.c). This function basically looks like follows: void _imp_getProcessLocale(...) { /* set the locale defined by the env vars */ char* locale = setlocale( LC_CTYPE, "" ); /* fallback to the current locale */ if( NULL == locale ) locale = setlocale( LC_CTYPE, NULL ); /* return the LC_CTYPE locale */ *ppLocale = _parse_locale( locale ); } If the function fails to provide a valid locale the "C" locale will be used by sal/osl (see _parse_locale in the same file). It seems that under MacOS X the "C" locale is always active no matter which language is configured which would be a reasonable explanation for the problems on Hebrew systems. Does MacOS X have means to query the currently configured locale and wouldn't it be more useful to implement osl_getProcessLocale Mac specific? I'm happily willing to accept and integrate patches into sal. If there is no better patch than the currently suggested one we can also take this one. Kind Regards, Tino
Because of limited resources deferred to OOo later.
Meanwhile I've got a Mac of my own and can pick up the problem. As described already the problem is that a Mac specific way for detecting the system locale is necessary. The Mac has an own API for this. It is necessary that a '.UTF-8' will be appended to each returned locale e.g. 'en_US.UTF-8' because this part will be used to determine which encoding shall be used for encoding/decoding file names.
*** Issue 46963 has been marked as a duplicate of this issue. ***
Created attachment 27507 [details] Patch just tested under Panther!
Created attachment 27508 [details] Hack in file.cxx no longer needed with osxlocale patch
Platform -> 'Macintosh' OS -> 'Mac OS X'
Fixed on cws macosx10
Verified with m112 / Mac OSX Tiger
Patches for 27 June will only work with OOo 1.9_m series, not with SRX645. Appropriate patches for SRX645 are in macxjoin1153.
*** Issue 50503 has been marked as a duplicate of this issue. ***
thanks I can input Japanese with this patch: verified with: 1.9m119/kinput2.macim
thanks I can input Japanese with this patch: verified with: 1.9m119/kinput2.macim.
Compile fails for me, saying: ============= Building project udkapi ============= /sw/src/fink.build/openoffice.org-ja-1.9m121-50/udkapi/com/sun/star mkout -- version: 1.4 idlc @/tmp/mkDmCDfr Could not get Canonical Locale Identifier from AppleLanguages value! Bus error dmake: Error code 138, while making '../../../unxmacxp.pro/misc/urd_css.don' '---* tg_merge.mk *---' ERROR: Error 65280 occurred while making /sw/src/fink.build/openoffice.org-ja-1.9m121-50/udkapi/com/sun/star dmake: Error code 1, while making 'build_all' In my environment (Tiger), $ defaults read 'Apple Global Domain' AppleLanguages returns: The domain/default pair of (kCFPreferencesAnyApplication, AppleLanguages) does not exist Any helps?
TRA: Verified on master -> ok. Closing issue.