Bug 38834 - 16-bit unicode string literals
Summary: 16-bit unicode string literals
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
(earliest affected)
Hardware: Other All
: medium normal
Assignee: Not Assigned
Keywords: easyHack
Depends on:
Reported: 2011-06-30 09:14 UTC by Björn Michaelsen
Modified: 2015-12-18 10:02 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:
Regression By:


Note You need to log in before you can comment on or make changes to this bug.
Description Björn Michaelsen 2011-06-30 09:14:38 UTC
16-bit unicode string literals

Background: Our ASCII string handling is slow and inefficient. C++0x introduces "UTF-16 string literals" (Wikipedia) and see icu/source/common/unicode/unistr.h and icu/source/common/unicode/platform.h for hackery for other platforms. mozilla even uses -wshort-wchar and L"string literals", though the standard approach is better. To implement this, we should add a SAL_STRING_STATIC_FLAG to create rtl_uStrings with, and instrument rtl_uString_assign - to deep copy these when necessary.

Skills: build, C++
Comment 1 Allan Jacobs 2011-11-20 15:11:52 UTC
Some of the work is done.
SAL_STRING_STATIC_FLAG is defined in core/sal/rtl/source/strimp.hxx
    #define SAL_STRING_STATIC_FLAG 0x40000000
SAL_STRING_STATIC_FLAG is used in core/sal/rtl/source/strimp.hxx
    #define SAL_STRING_IS_STATIC(a) ((a)->refCount & SAL_STRING_STATIC_FLAG)
SAL_STRING_STATIC_FLAG is used in core/sal/rtl/source/ustring.cxx in the initializer for static rtl_uString.
SAL_STRING_STATIC_FLAG is used in core/sal/rtl/source/string.cxx in the initializer for static rtl_String.

SAL_STRING_IS_STATIC is used directly and indirectly in many of the methods defined in /core/sal/rtl/source/strtmpl.cxx.

SAL_STRING_IS_STATIC is also used (trivially) in /core/sal/rtl/source/hash.cxx

I think rtl_uString_assign is defined in strtmpl.cxx in code for
                                             IMPL_RTL_STRINGDATA* pStr )
Comment 2 Florian Reisinger 2012-05-18 08:58:21 UTC
Deteted "Easyhack" from summary
Comment 3 Luboš Luňák 2012-12-08 13:18:00 UTC
This is most probably not doable without a macro. Since rtl_uString allocates the string as a part of itself, the extra space would need to be allocated as well for each literal, with the string data inside it, but even inline functions, templates and whatnot don't seem to do.

With a macro it's doable with something along the lines of

#define OUStringLiteral( str ) \
    ( \
    ([]() -> OUString { static const rtl_uString_sized< sizeof( str ) > data = { SAL_STRING_STATIC_FLAG|1, sizeof( str ) - 1, u"" str }; return OUString( &data ); })() \

but that pretty much means putting RTL_CONSTASCII_USTRINGPARAM back everywhere :(. That's kinda lame, after all the work to remove it, and it would be good to first check if the uglification is actually worth the gain.
Comment 4 Caolán McNamara 2012-12-10 10:44:27 UTC
Yeah, I'd forgotten about this easy hack. It was intended alright to attempt adapting RTL_CONSTASCII_USTRINGPARAM. Lets drop this easy hack after all.
Comment 5 Robinson Tryon (qubit) 2015-12-18 10:02:29 UTC
Migrating Whiteboard tags to Keywords: (EasyHack)