Build fails with ICU 74 (Alpine Linux edge, with musl libc): [build BRK] CustomTarget/i18npool/breakiterator/edit_word.brk S=/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1 && I=$S/instdir && W=$S/workdir && /usr/bin/genbrk -r $W/CustomTarget/i18npool/breakiterator/edit_word.txt -o $W/CustomTarget/i18npool/breakiterator/edit_word.brk > /dev/null [build CCD] CustomTarget/i18npool/breakiterator/edit_word_brk.c S=/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1 && I=$S/instdir && W=$S/workdir && /usr/sbin/genccode -n OpenOffice -d $W/CustomTarget/i18npool/breakiterator/ $W/CustomTarget/i18npool/breakiterator/edit_word.brk > /dev/null sed -e "s#\[:LineBreak = Close_Punctuation:\]#\[& \[:LineBreak = Close_Parenthesis:\]\]#" \ \ \ -e "/Prepend/d" \ /home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/i18npool/source/breakiterator/data/line.txt > /home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/workdir/CustomTarget/i18npool/breakiterator/line.txt [build BRK] CustomTarget/i18npool/breakiterator/line.brk S=/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1 && I=$S/instdir && W=$S/workdir && /usr/bin/genbrk -r $W/CustomTarget/i18npool/breakiterator/line.txt -o $W/CustomTarget/i18npool/breakiterator/line.brk > /dev/null createRuleBasedBreakIterator: ICU Error "U_BRK_UNRECOGNIZED_OPTION" at line 17, column 14 make[1]: *** [/home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/i18npool/CustomTarget_breakiterator.mk:90: /home/ncopa/aports/community/libreoffice/src/libreoffice-7.6.3.1/workdir/CustomTarget/i18npool/breakiterator/line.brk] Error 12 Line 17 in line.txt contains: !!LBCMNoChain; https://github.com/LibreOffice/core/blob/ff3fb42b48c70ba5788507a6177bf0a9f3b50fdb/i18npool/source/breakiterator/data/line.txt#L17 This appears to have been removed in ICU in this commit: https://github.com/unicode-org/icu/commit/84e47620692be90950d090f2f4722494b020ad96 And genbrk fails... I am not sure what the proper fix here is.
There's a patch which needs some tweaks here: https://gerrit.libreoffice.org/c/core/+/158749
(In reply to Julien Nabet from comment #1) > There's a patch which needs some tweaks here: > https://gerrit.libreoffice.org/c/core/+/158749 The comment there shows that they bumped into same genbrk issue as I did when building with system ICU.
(In reply to Natanael Copa from comment #0) > This appears to have been removed in ICU in this commit: > https://github.com/unicode-org/icu/commit/ > 84e47620692be90950d090f2f4722494b020ad96 Oh great, "Within ICU, it is used only with the line break rules. We hope to replace it with something more general." They hoped. But apparently they didn't? However, I suggest to simply remove that one line, or maybe better turn it into a comment and point to this bug here / the ICU commit there.
(In reply to Eike Rathke from comment #3) > (In reply to Natanael Copa from comment #0) > > This appears to have been removed in ICU in this commit: > > https://github.com/unicode-org/icu/commit/ > > 84e47620692be90950d090f2f4722494b020ad96 > Oh great, "Within ICU, it is used only with the line break > rules. We hope to replace it with something more general." They hoped. But > apparently they didn't? > > However, I suggest to simply remove that one line, or maybe better turn it > into a comment and point to this bug here / the ICU commit there. I found a webpage related to this bug. > The !!LBCMNoChain option (Line Break Combining Marks No Chain) will be > disappearing. It was a hack used in implementing the Unicode line break rules, > and with the new !^ pattern character providing finer grained control over > rule chaining, it is no longer necessary. "[ICU-12331] Update UserGuide for new BreakIterator behavior and rule syntax - Unicode Consortium" <https://unicode-org.atlassian.net/browse/ICU-12331>
Well, fine, but what does that actually _mean_ for our line break rules? Adding Khaled to Cc, maybe he knows as he touched i18npool/source/breakiterator/data/line.txt recently. IMHO it's now time to align our rules with ICU upstream's https://github.com/unicode-org/icu/blob/main/icu4c/source/data/brkitr/rules/line.txt and reapply the historically grown changes _if still necessary_.
(In reply to Eike Rathke from comment #5) > Well, fine, but what does that actually _mean_ for our line break rules? > > Adding Khaled to Cc, maybe he knows as he touched > i18npool/source/breakiterator/data/line.txt recently. > > IMHO it's now time to align our rules with ICU upstream's > https://github.com/unicode-org/icu/blob/main/icu4c/source/data/brkitr/rules/ > line.txt and reapply the historically grown changes _if still necessary_. The minimum version of ICU required to build LibreOffice is 66. Is it no problem to use icu4c/source/data/brkitr/rules/line.txt from 74.1 or the main branch?
Over the years (even decades now) we added specific rules for certain languages that were either not handled by ICU or handled differently and didn't suit what users expected. While technically possible to simply use the ICU 74.1 rules, we would lose all adjustments and users of the languages we handled differently could experience significantly different layout of their documents. It may be that now the current ICU rules suit our purpose, but that would have to be evaluated, and for each modification that previously was applied (visible per git log history) it may be necessary to reapply the changes to the now current rules.
fixed with https://bugs.documentfoundation.org/show_bug.cgi?id=49885#c17. backport of this (and the preceeding https://bugs.documentfoundation.org/show_bug.cgi?id=49885#c13) makes it build in my 24.2.3