Apache OpenOffice (AOO) Bugzilla – Issue 73908
in-line rtl[u]String definitions ...
Last modified: 2017-05-20 11:31:16 UTC
So, it would be altogether more pleasant if instead of having to construct strings from Ascii (with RTL_CONSTASCII_USTRINGPARAM) we could create an in-line, (and static) rtl_[u]String object, that is in .rodata - hence shared across all processes, and tagged as 'static' thus, not requiring ref-counting (cf. #73907#) This can be done (for gcc at least) nice & conditionally & source compatibly [ Kay's neat idea ]: /* force expansion */ #define RTL_CONSTASCII_USTRINGPARAM_STR( constAsciiStr ) L##constAsciiStr /* emit in-line static .rodata rtl_uString */ #define RTL_CONSTASCII_USTRINGPARAM( constAsciiStr ) \ ((rtl_uString *)(&((const struct { sal_Int32 r; sal_Int32 l; const wchar_t b[sizeof(constAsciiStr)]; } ) { (1<<30)+1, sizeof(constAsciiStr)-1, RTL_CONSTASCII_USTRINGPARAM_STR(constAsciiStr) }))), SAL_NO_ACQUIRE
committed to salstrintern, ommitting the 'string.hxx' variant, since it causes grief with (eg.) OStringBuffer.append(), and/or rtl_str_compareWithLength (etc. etc.) where people used the relevant CONSTASCII macro in the past. Though changes may be somewhat minimal to clean all callers, at least some work would be needed.
Of course, this helps find some wonderful pieces of code around the place: eg. the mercifully unused ;-) --- rdbmaker/inc/codemaker/global.hxx +++ rdbmaker/inc/codemaker/global.hxx @@ -94,10 +94,10 @@ sal_Bool fileExists(const ::rtl::OString sal_Bool checkFileContent(const ::rtl::OString& targetFileName, const ::rtl::OString& tmpFileName); const ::rtl::OString inGlobalSet(const ::rtl::OUString & r); -inline const ::rtl::OString inGlobalSet(sal_Char* p) -{ - return inGlobalSet( ::rtl::OUString( RTL_CONSTASCII_USTRINGPARAM(p) ) ); -} ;-)
But wchar_t is often 32 bit, not 16 bit as we would need it. What am I missing?
sb: you're missing the solenv/inc/ patch: CFLAGS+=-fshort-wchar -DRTL_INLINE_STRINGS CFLAGSCXX+=-fshort-wchar -DRTL_INLINE_STRINGS :-) and the fact that this is a source-transparent optimisation we turn on conditionally only where it works.
Strictly speaking, this breaks a stable API, as RTL_CONSTASCII_USTRINGPARAM is documented to "Supply an ASCII string literal together with its length and text encoding." ;)
Alternative would be, to introduce a new macro with the desired spec. and to adapt all (possible) locations of RTL_CONSTASCII_USTRINGPARAM .... IMHO, the risk is relatively low, so we might want to take it ... ;-)
1 -fshort-wchar: Somewhat thin ice. At least libstd++ and libstlport shipped with OOo export wchar_t-infected C++ functionality. You can only hope that nobody uses it. And libstlport is a public part of URE, so it should not change incompatibly, so we should not build it with -fshort-wchar. 2 "IMHO, the risk is relatively low" I think so too (hence the smiley). However, the documentation of RTL_CONSTASCII_USTRINGPARAM should be changed to state that it expands to an argument list suitable for a call to exactly one of the rtl::OUString constructors.
2. wrt. the risk, I'm building up a patch of the changes necessary around the place - and it seems, they are becoming more substantial, which is a problem. Particularly in places where an attempt has been made to optimise string usage already - eg. 'xmloff' constructs are used that make this harder. What might be better would be to use a macro that had control over the OUString construction too [ and resign ourselves to more 'global' OUStrings (?) ], ie. RTL_USTRING_LITERAL("foo") -> either OUString("foo", sizeof("foo")-1, ASCII), or OUString(mangled_macro("foo"), SAL_NO_ACQUIRE) That way perhaps we would have more joy, and it'd be more compatible; it'd also have the merit of being rather shorter than the existing OUString( RTL_VERY_LONG_NAME_INDEED ("foo")) - what do you think ?
Hokay - I'm backing this out of the CWS - for several reasons: * it doesn't work as is: + We really need a macro (eg.) RTL_USTRING_LITERAL(a) that will include the OUString() creation, then use that in hot-spots. * it doesn't save that much memory: + much of the string duplication I've measured is removed by using 'intern' anyway, worse using it will increase the size of the binaries by more than the heap we will save, since at a given time most strings are not instantiated on the heap. * shlib unloading (normal lifecycle problem) + if we pass references to .rodata static strings around the app, then we can never unload that library again, ergo we should do this sparingly where we can prove there is a useful win I think. And of course, I don't want the problems to block the 'intern' feature getting integrated :-)
Created attachment 42822 [details] patch for posterity
As an aside new C++ standard will have new string literals of... u8"UTF-8 string." u"UTF-16 string." U"UTF-32 string."
Reset assigne to the default "issues@openoffice.apache.org".