Created attachment 122013 [details] Non-BMP characters handling test case In general, text functions that has position or length as the parameter or return value have incorrect results. Check attachment.
Hi Mark, thanks for reporting. What font is used for TS?, i have not it installed.
You might need MingLiu-EXTB to see the characters.
There are special functions for fonts with double byte characters. like: MIDB() https://help.libreoffice.org/index.php?title=5.0/Calc/Text_Functions&Language=en-US&System=WIN&Version=5.0#bm_id2955392 LENB() https://help.libreoffice.org/index.php?title=5.0/Calc/Text_Functions&Language=en-US&System=WIN&Version=5.0#bm_id2954050 UNICHAR() https://help.libreoffice.org/index.php?title=5.0/Calc/Text_Functions&Language=en-US&System=WIN&Version=5.0#bm_id0907200904011330 And I guess you nee to set up properly Menu/Tools/Options/Language settings/Default language, for CTL documents. And there is the same option in Cell format. I'm not used to work with CTL languages, maybe you can find help in the ML http://nabble.documentfoundation.org/ And please don't set up you own reports as new, the usually is someone else doing it after reproduce the issue.
m.a.rsiov: It was a mistake to change it to NEW. Thanks. I think I can explain more for this issue: Test string is composed of 4 unicode characters: "𧌒𧀬", U+22 U+27312 U+2702C U+22. I expect that LEN(TS) been expressed as number of unicode code points. However, LO internally store characters in UTF16, which means U+27312 and U+2702C has to be represented as "surrogate pairs", hence len(TS) would be different from the result of counting codepoints.
Bonjour, sauf erreur de ma part la fonction : NB.Si renvoie une valeur erronée dans le comptage d'occurences d'une base de données (4 colonnes et 6 lignes). Exemple : 3 lignes avec le terme "Ami" et 2 lignes avec le terme "Famille" NB.si renvoie : 5 Ami et 2 Famille A B C D 1 Critère Fonction Détail Base de donnée 2 Famille 2 =NB.SI(D$2:D$6;A2) Ami 3 Ami 5 ???? =NB.SI(D$2:D$6;A3) Ami 4 Relation 0 =NB.SI(D$2:D$6;A4) Famille 5 Dr Famille 6 Div Ami
Hi @azurement, this bugzilla system it's used only on English, and please don't introduce an issue into another, one bug report one issue. BTW there is nothing wrong with the results, 'ami' is part of 'Ami' and part 'Famille', there is an option: - Menu/Tools/Options/LibreOffice calc/Caculate - Search criteria = ...... enable it and should work as you expect. And please there is a mailing list in French where to ask this questions before report a bug, thanks.
Created attachment 129931 [details] Non-BMP characters handling test case (with "üë" string) Based on Mark's example, and Dennis' sample from bug 101962, here's one that doesn't require special fonts. I'm afraid that's all I can add to the discussion.
I can not reproduce the bug in Versión: 5.2.3.3 Id. de compilación: d54a8868f08a7b39642414cf2c8ef2f228f780cf Subprocesos de CPU: 2; Versión de SO: Windows 5.1; Repr. de IU: predeterminado; Configuración regional: es-AR (es_AR); Calc: group The attachment 129931 [details] (after opening it in LO and saving as MS Excel 2007-2013 (.xlsx)) produced the same results in MS Excel 2011. Apparently LO works the same as MS Excel with the "üë" string.
The OpenFormula standard ODFF1.2 states for functions LEFT, LEN, MID, RIGHT that the result may be application-dependent, see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#Text-String . I will look into the code of all functions and see what causes the behaviour as in both attachments.
As already said in comment #4, LO uses 16bits for unicode characters, which is not enough for the UniCode 5.2.0 (referenced to in ODFF1.2). Changing this could have far-reaching effects in all sorts of places in the LO code. Anyway, IMHO such a change would require an ESC decision. As Excel 2016 has the same results as LO, interoperability would be affected when LO is changed. I suggest to set this bug report to RESOLVED-NOTABUG and create a new bug report/enhancement request to change the UniCode typedef in the LO code. Adding ERack to cc as he is the Calc expert.
(In reply to Winfried Donkers from comment #10) > As already said in comment #4, LO uses 16bits for unicode characters, which > is not enough for the UniCode 5.2.0 (referenced to in ODFF1.2). That's not true. Strings are UTF-16, which is well capable to represent all Unicode planes. However, the second example is not even about non-BMP characters, the "üë" in B1 contains the sequence U+0075 LATIN SMALL LETTER U U+0308 COMBINING DIAERESIS U+0065 LATIN SMALL LETTER E U+0308 COMBINING DIAERESIS To regard a combined character as one character, some Unicode normalization would have to take place. Which, if at all, should be implemented in a separate function. As for the first attachment, these are real non-BMP characters. For this to work correctly the mentioned spreadsheet functions should operate on the actual Unicode characters code points instead of the 16-bit units. There's OUString::iterateCodePoints() to do that properly, a change in LO's string handling is not necessary. > As Excel 2016 has the same results as LO, interoperability would be affected > when LO is changed. Also the same result with the first attachment and real non-BMP characters? Anyway, I couldn't find a definition for Excel how they'd treat non-BMP characters, but as LEN() is supposed to return the length in *characters*, not in 16-bit code units, I propose to actually handle it that way. I consider treating these string functions in code units wrong. Specifically with LEFT() or RIGHT() or MID() or REPLACE() that should operate on characters, otherwise with code units could cut non-BMP characters in the middle.
(In reply to Eike Rathke from comment #11) > As for the first attachment, these are real non-BMP characters. For this to > work correctly the mentioned spreadsheet functions should operate on the > actual Unicode characters code points instead of the 16-bit units. There's > OUString::iterateCodePoints() to do that properly, a change in LO's string > handling is not necessary. > > > As Excel 2016 has the same results as LO, interoperability would be affected > > when LO is changed. > Also the same result with the first attachment and real non-BMP characters? > Anyway, I couldn't find a definition for Excel how they'd treat non-BMP > characters, but as LEN() is supposed to return the length in *characters*, > not in 16-bit code units, I propose to actually handle it that way. I > consider treating these string functions in code units wrong. Specifically > with LEFT() or RIGHT() or MID() or REPLACE() that should operate on > characters, otherwise with code units could cut non-BMP characters in the > middle. Thanks for your clarification, I will see if I can fix this. (And I will document the Excel results in the unit test documents.)
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=e78f508997f9384518c2dd1a005c5306ccd24783 tdf#97198 Make calc function MID work with Unicode non-BMP characters. It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=09455734727daa3774158795febd910b33b1d784 tdf#97198 Make Calc function RIGHT work with Unicode non-BMP characters. It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=5e2a3eed69060d63070d72df60cd84247ca0bc9e tdf#97198 Make Calc function FIND work with UniCode non-BMP characters. It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=47155ce8fb4302d377b374d39b501d0a0f7577c5 tdf#97198 Make Calc function LEN work with Unicode non-BMP characters. It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=78b7e5f76e72647e0cab662ad2c9342d4538704f tdf#97198 Make Calc function SEARCH work with UniCode non-BMP characters. It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4e3530b4f870e0470c23ae89cdb8b9a09af54d59 tdf#97198 Make Calc function LEFT work with Unicode non-BMP characters It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=aaba725b466950f47c4e93d790986d209e80ead3 tdf#97198 Make Calc function REPLACE work with UniCode non-BMP-characters. It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "libreoffice-6-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=44487fee8646c0ded5c2b2ffaaa70e75a7af7e04&h=libreoffice-6-0 tdf#97198 Make Calc function REPLACE work with UniCode non-BMP-characters. It will be available in 6.0.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=c5d0cffbf42f41349a8575ba52456d9cb07e91c7 tdf#97198 follow up: improve efficiency for Calc function RIGHT. It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Winfried Donkers committed a patch related to this issue. It has been pushed to "libreoffice-6-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=664d40fb24bf500a22efd1f1aff58a96b3c3cf2c&h=libreoffice-6-0 tdf#97198 follow up: improve efficiency for Calc function RIGHT. It will be available in 6.0.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.