Bug 158067 - Replace O(U)StringLiterals with custom O(U)String literals in code
Summary: Replace O(U)StringLiterals with custom O(U)String literals in code
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:24.2.0 target:24.8.0
Keywords: difficultyBeginner, easyHack, skillCpp, topicCleanup
Depends on:
Blocks: Dev-related
  Show dependency treegraph
 
Reported: 2023-11-05 07:48 UTC by Mike Kaganski
Modified: 2024-04-13 18:31 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2023-11-05 07:48:04 UTC
Throughout the codebase, there are hundreds of initializations of O(U)Strings with string literals, like

    OUString foo = "abc";
    OString bar("def");
    std::vector<OUString> baz = {"xyz1", "xyz2", "xyz3"};

Every time such an initialization appears, a string constructor is called at runtime, which allocates memory, and copies strings. This is because O(U)String is not a plain character array, but a class with a pointer to a structure rtl_(u)String that holds information about reference count, size, and the actual character array.

To avoid overhead of such construction, several techniques were introduced over time; with C++17 adoption, we used an updated O(U)StringLiteral, which is a templated structure, with a layout compatible with rtl_(u)String; such O(U)StringLiterals are created at compile time (typically as static inline constexpr objects), and creation of OUStrings from those became a trivial operation. But that required a bloat of such helper objects, which is inconvenient. Instead of a clear

    for (;;)
    {
        // ...
        OUString foo = "abc"; // Allocating memory in a loop
        // doing something with the string...
    }

we had 

    static constexpr OUStringLiteral a_abc("abc"); // compile-time constant
    // ...
    for (;;)
    {
        // ...
        OUString foo(a_abc); // Trivial construction without allocation
        // doing something with the string...
    }

Does the trick, but not clean.

Introduction of C++20 support in the codebase (commit 1eef07805021b7ca26a1a8894809b6d995747ba1 Bump baseline to C++20, 2023-09-22) pawed a way to use custom literals; and Stephan came with a solution that allows to (mostly) get rid of use of the intermediate O(U)StringLiteral objects, and have the benefir of compile-time creation of the string objects: commit 27d1f3ac016d77d3c907cebedca558308f366855 (O[U]String literals (unusable for now, C++20 only), 2023-07-14) introduced the operator ""_ostr() to both OString and OUString; and commit e83e62fe376a91f7270435e06ee7f6864c48fb4b (Work around MSVC bug with "..."_ostr vs. u"..."_ostr, 2023-07-19) renamed it in OUString into operator ""_ustr(). These operators still use O(U)StringLiterals internally, but hide it from the programmer.

Now it is possible to write

    for (;;)
    {
        // ...
        OUString foo = u"abc"_ustr; // trivial initialization using a compile-time object
        // doing something with the string...
    }

While the u"abc"_ustr or "abc"_ostr syntax is a bit bulkier than simple "abc", it is inline, is unambiguous, and provides the wanted optimization using much better developer experience than the previous O(U)StringLiteral solution.

The easy hack is to replace uses of O(U)StringLiteral in codebase with static constexpr O(U)String, using the user-defined literals. It is possible in most cases; an exception is where the string content is then used at compile time for construction of other constexpr/consteval objects, like in basic/source/classes/sb.cxx, where pCountStr and friends are used later to calculate constexpr hashes. Due to the specifics of rtl_(u)String, where the buffer is declared as a single-character array, but is actually as large as needed to hold all the string, the trick makes compile-time use of the characters in the compile-time-constructed O(U)String an UB.

Indeed, the unit tests specifically designed to test O(U)StringLiterals should be kept intact.
Comment 1 Commit Notification 2023-11-05 18:54:53 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/eae5559af6bbf7727431fcb44e79492adbd59c41

tdf#158067: an example of OUStringLiteral -> operator u""_ustr replacement

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 2 Commit Notification 2023-11-05 21:25:10 UTC
Bogdan B committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1ad81f7e69b545340e340b54f9c9dd387b17cce0

tdf#158067 Replace OUStringLiteral in accdoc

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 Stephan Bergmann 2023-11-06 08:20:28 UTC
Cases where an existing O[U]StringLiteral variable was accessed at potentially more than one place in the code have been automatically rewritten to instead use O[U]String variables with <https://git.libreoffice.org/core/+/769899853557831ae53d020497e81c8fe572874b%5E!/> "Extended loplugin:ostr: Automatic rewrite some O[U]StringLiteral -> O[U]String" et al.

Rewriting cases where an existing O[U]StringLiteral variable is accessed at exactly one place in the code (and the variable can thus be elided completely, directly using an O[U]String literal at the place of use) is work in progress by me.  It is for now covered by the

>                         //TODO, left for later:
>                         continue;

in compilerplugins/clang/ostr.cxx introduced with that 769899853557831ae53d020497e81c8fe572874b mentioned above.

So feel free to fix individual cases as an easy hack here, but note that I may eventually rewrite all the remaining cases automatically in one go, to potentially get rid of the O[U]StringLiteral classes in their current form completely.
Comment 4 Devansh Varshney 2023-12-22 08:31:19 UTC
Hi Team,
I am looking for a starting point to contribute to the project.

I followed this blog from the recent tweet -https://dev.blog.documentfoundation.org/2023/12/21/custom-string-literals-two-easyhacks/

and apologies as I assigned this bug to myself without asking I was going through this https://wiki.documentfoundation.org/Development/EasyHacks
Comment 5 Buovjaga 2023-12-22 08:58:54 UTC
(In reply to Devansh Varshney from comment #4)
> Hi Team,
> I am looking for a starting point to contribute to the project.
> 
> I followed this blog from the recent tweet
> -https://dev.blog.documentfoundation.org/2023/12/21/custom-string-literals-
> two-easyhacks/
> 
> and apologies as I assigned this bug to myself without asking I was going
> through this https://wiki.documentfoundation.org/Development/EasyHacks

This is a task that multiple people can work on at the same time. Therefore there is no need to assign it to a single person.
Comment 6 Commit Notification 2024-02-23 07:05:56 UTC
Luv Sharma committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a6cdc75f98a9449fd796420170d2097e96b6e873

tdf#158067: an example of OUStringLiteral -> operator u""_ustr replacement

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2024-03-23 19:50:48 UTC
RMZeroFour committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e7eae3aa6e3af0b05c6aa471d2c4892918757e7a

tdf#158067 Replace OUStringLiteral with _ustr

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2024-04-13 18:31:43 UTC
Aaron Bourdeaux committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/734a84f33d3f08e31086c2dbd629715608a178d5

tdf#158067 Replace OUStringLiteral with _ustr

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.