Bug 158108 - Build failure with ICU 74
Summary: Build failure with ICU 74
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
7.6.2.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: ICU 160570
  Show dependency treegraph
 
Reported: 2023-11-07 21:33 UTC by Natanael Copa
Modified: 2024-04-06 20:09 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Natanael Copa 2023-11-07 21:33:21 UTC
Build fails with ICU 74 (Alpine Linux edge, with musl libc):


[build BRK] CustomTarget/i18npool/breakiterator/edit_word.brk
S=/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1 && I=$S/instdir && W=$S/workdir &&  /usr/bin/genbrk  -r $W/CustomTarget/i18npool/breakiterator/edit_word.txt -o $W/CustomTarget/i18npool/breakiterator/edit_word.brk > /dev/null
[build CCD] CustomTarget/i18npool/breakiterator/edit_word_brk.c
S=/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1 && I=$S/instdir && W=$S/workdir &&  /usr/sbin/genccode -n OpenOffice -d $W/CustomTarget/i18npool/breakiterator/ $W/CustomTarget/i18npool/breakiterator/edit_word.brk > /dev/null
sed -e "s#\[:LineBreak =  Close_Punctuation:\]#\[& \[:LineBreak = Close_Parenthesis:\]\]#" \
         \
         \
        -e "/Prepend/d" \
        /home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/i18npool/source/breakiterator/data/line.txt > /home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/workdir/CustomTarget/i18npool/breakiterator/line.txt
[build BRK] CustomTarget/i18npool/breakiterator/line.brk
S=/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1 && I=$S/instdir && W=$S/workdir &&  /usr/bin/genbrk  -r $W/CustomTarget/i18npool/breakiterator/line.txt -o $W/CustomTarget/i18npool/breakiterator/line.brk > /dev/null
createRuleBasedBreakIterator: ICU Error "U_BRK_UNRECOGNIZED_OPTION"  at line 17, column 14
make[1]: *** [/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/i18npool/CustomTarget_breakiterator.mk:90: /home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/workdir/CustomTarget/i18npool/breakiterator/line.brk] Error 12


Line 17 in line.txt contains:

!!LBCMNoChain;

https://github.com/LibreOffice/core/blob/ff3fb42b48c70ba5788507a6177bf0a9f3b50fdb/i18npool/source/breakiterator/data/line.txt#L17

This appears to have been removed in ICU in this commit:
https://github.com/unicode-org/icu/commit/84e47620692be90950d090f2f4722494b020ad96

And genbrk fails...

I am not sure what the proper fix here is.
Comment 1 Julien Nabet 2023-11-08 08:35:38 UTC
There's a patch which needs some tweaks here:
https://gerrit.libreoffice.org/c/core/+/158749
Comment 2 Natanael Copa 2023-11-08 11:42:18 UTC
(In reply to Julien Nabet from comment #1)
> There's a patch which needs some tweaks here:
> https://gerrit.libreoffice.org/c/core/+/158749

The comment there shows that they bumped into same genbrk issue as I did when building with system ICU.
Comment 3 Eike Rathke 2023-11-08 13:18:35 UTC
(In reply to Natanael Copa from comment #0)
> This appears to have been removed in ICU in this commit:
> https://github.com/unicode-org/icu/commit/
> 84e47620692be90950d090f2f4722494b020ad96
Oh great, "Within ICU, it is used only with the line break
rules. We hope to replace it with something more general." They hoped. But apparently they didn't?

However, I suggest to simply remove that one line, or maybe better turn it into a comment and point to this bug here / the ICU commit there.
Comment 4 taichi 2023-11-08 13:36:19 UTC
(In reply to Eike Rathke from comment #3)
> (In reply to Natanael Copa from comment #0)
> > This appears to have been removed in ICU in this commit:
> > https://github.com/unicode-org/icu/commit/
> > 84e47620692be90950d090f2f4722494b020ad96
> Oh great, "Within ICU, it is used only with the line break
> rules. We hope to replace it with something more general." They hoped. But
> apparently they didn't?
> 
> However, I suggest to simply remove that one line, or maybe better turn it
> into a comment and point to this bug here / the ICU commit there.


I found a webpage related to this bug.

> The !!LBCMNoChain option (Line Break Combining Marks No Chain) will be
> disappearing. It was a hack used in implementing the Unicode line break rules,
> and with the new !^ pattern character providing finer grained control over
> rule chaining, it is no longer necessary.

"[ICU-12331] Update UserGuide for new BreakIterator behavior and rule syntax - Unicode Consortium"
<https://unicode-org.atlassian.net/browse/ICU-12331>
Comment 5 Eike Rathke 2023-11-09 11:27:20 UTC
Well, fine, but what does that actually _mean_ for our line break rules?

Adding Khaled to Cc, maybe he knows as he touched i18npool/source/breakiterator/data/line.txt recently.

IMHO it's now time to align our rules with ICU upstream's https://github.com/unicode-org/icu/blob/main/icu4c/source/data/brkitr/rules/line.txt and reapply  the historically grown changes _if still necessary_.
Comment 6 taichi 2023-11-20 07:21:50 UTC
(In reply to Eike Rathke from comment #5)
> Well, fine, but what does that actually _mean_ for our line break rules?
> 
> Adding Khaled to Cc, maybe he knows as he touched
> i18npool/source/breakiterator/data/line.txt recently.
> 
> IMHO it's now time to align our rules with ICU upstream's
> https://github.com/unicode-org/icu/blob/main/icu4c/source/data/brkitr/rules/
> line.txt and reapply  the historically grown changes _if still necessary_.

The minimum version of ICU required to build LibreOffice is 66.
Is it no problem to use icu4c/source/data/brkitr/rules/line.txt from 74.1 or the main branch?
Comment 7 Eike Rathke 2024-02-01 11:54:26 UTC
Over the years (even decades now) we added specific rules for certain languages that were either not handled by ICU or handled differently and didn't suit what users expected. While technically possible to simply use the ICU 74.1 rules, we would lose all adjustments and users of the languages we handled differently could experience significantly different layout of their documents. It may be that now the current ICU rules suit our purpose, but that would have to be evaluated, and for each modification that previously was applied (visible per git log history) it may be necessary to reapply the changes to the now current rules.