Description: Double-clicking a text cell to edit it may put the character in some kind of invalid "mid-character" position that corrupts the text. See reproduction steps. Steps to Reproduce: 1. Install LibreOffice 7.4.0.3 on Windows. 2. Open the attached CSV. It contains 1 cel with 6 consecutive musical note characters. 3. In the import dialog, ensure the character set is set to UTF-8 4. Notice that the cell preview displays the content incorrectly at this step. 5. Click OK 6. Double-click in cell A1 anywhere between two of the note characters. 7. Press the "a" key Actual Results: The character preceding the editing carat is replaced with 2 damaged characters and an "a" character between them. Additionally, the cell edit box above the spreadsheet will no longer match the cell contents. Expected Results: The "a" character should be inserted between the note characters where the editing carat indicates. Reproducible: Always User Profile Reset: Yes OpenGL enabled: Yes Additional Info: Version: 7.4.0.3 (x64) / LibreOffice Community Build ID: f85e47c08ddd19c015c0114a68350214f7066f5a CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win Locale: en-US (en_US); UI: en-US Calc: threaded
Created attachment 181960 [details] Test document
Forgot to mention: This does not happen if I press the left or right arrow keys to move the editing carat before step 7.
The sample file you attached has only one single character inside of it. Shouldn't it contain 6 characters? I opened it in a text editor and it only contains one musical note character. Could you check if this is the correct file?
I checked the file and it is correct. It should be 26 bytes, it contains the byte sequence "F0 9D 85 A0 F0" (U+1D160) six times, followed by "0D 0A" (CR LF) Downloaded the test document from Bugzilla and opened it on Windows and it works as described in the bug report.
Er sorry the repeated byte sequence is "F0 9D 85 A0"
[Automated Action] NeedInfo-To-Unconfirmed
(In reply to Eric Lasota from comment #5) > Er sorry the repeated byte sequence is "F0 9D 85 A0" I opened the CSV file in Okteta hex editor and confirmed this. It turns out that in Kate text editor as well as LibreOffice Calc, the six note characters are displayed on top of each other! So if you hit End key and start hitting backspace, you will remove the notes one by one. In nano editor and Notepad++ the characters are displayed in sequence. Bibisected with Linux 5.2 repo to https://git.libreoffice.org/core/commit/975c833943bab627eb461457ab1df35744b291cd upgrade harfbuzz version from 0.9.40 to 1.2.6 Not adding regression keyword as this is coming from a dependency.
The CSV import is not the issue here. Rather mouse pointer selection in a Calc cell or on the Input bar is able to split the highorder pair of SMP glyphs.
Paste this string into a Calc cell, unlike the musical notes these glyphs are available in DejaVu Sans U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1 Position the text cursor at the end of the pasted text. Enter <Alt>+X and the 🁠 and the 🂡 glyphs will be toggled. Use mouse cursor to point select on any glyph, in the sheet or in the InputBar. Type any keyboard character, "a" as in the OP. The mouse click selection has split the multi-byte codepoint and entering the character "breaks" the glyph. As noted cursor (<L,R>) movement correctly recognizes the SMP glyphs, just the mouse pointer selection is wrong.
@Julien, when you fixed similar for sm for bug 102625 [1] is the Calc instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an i18n COMPLEX script and no rtl::isSurrogate() test gets performed? =-ref-= [1] https://gerrit.libreoffice.org/c/core/+/93544
(In reply to V Stuart Foote from comment #10) > @Julien, when you fixed similar for sm for bug 102625 [1] is the Calc > instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an > i18n COMPLEX script and no rtl::isSurrogate() test gets performed? > > =-ref-= > [1] https://gerrit.libreoffice.org/c/core/+/93544 The original patch in master might be better ref where Stephan B. had comments about use of the isSurrogate() test. https://gerrit.libreoffice.org/c/core/+/93684
(In reply to V Stuart Foote from comment #10) > @Julien, when you fixed similar for sm for bug 102625 [1] is the Calc > instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an > i18n COMPLEX script and no rtl::isSurrogate() test gets performed? > > =-ref-= > [1] https://gerrit.libreoffice.org/c/core/+/93544 On pc Debian x86-64 with master sources updated today with gtk3 rendering here what I tested: - enter cell A1, then Ctrl-shift U 1F060 + Ctrl-shift U 1F060 + Ctrl-shift U 1F060 + Ctrl-shift U 1F060 to have 4 glyphs. - click on the end of the cell - Alt X => the last glyph disappears and is replaced with "U+1f0a1" I added some traces on the if else block: diff --git a/editeng/source/editeng/impedit2.cxx b/editeng/source/editeng/impedit2.cxx index 4e87e36af5d3..b00a6b8b8f46 100644 --- a/editeng/source/editeng/impedit2.cxx +++ b/editeng/source/editeng/impedit2.cxx @@ -4044,6 +4044,7 @@ sal_Int32 ImpEditEngine::GetChar( sal_uInt16 nScriptType = GetI18NScriptType( aPaM ); if ( nScriptType == i18n::ScriptType::COMPLEX ) { + fprintf(stderr, "COMPLEX\n"); uno::Reference < i18n::XBreakIterator > _xBI( ImplGetBreakIterator() ); sal_Int32 nCount = 1; lang::Locale aLocale = GetLocale( aPaM ); @@ -4058,6 +4059,7 @@ sal_Int32 ImpEditEngine::GetChar( } else { + fprintf(stderr, "NOT COMPLEX\n"); OUString aStr(pParaPortion->GetNode()->GetString()); // tdf#102625: don't select middle of a pair of surrogates with mouse cursor if (rtl::isSurrogate(aStr[nChar])) I got only NOT COMPLEX appearing on console logs.
(In reply to Julien Nabet from comment #12) > > I added some traces on the if else block: > diff --git a/editeng/source/editeng/impedit2.cxx > b/editeng/source/editeng/impedit2.cxx > index 4e87e36af5d3..b00a6b8b8f46 100644 > --- a/editeng/source/editeng/impedit2.cxx > +++ b/editeng/source/editeng/impedit2.cxx > @@ -4044,6 +4044,7 @@ sal_Int32 ImpEditEngine::GetChar( > sal_uInt16 nScriptType = GetI18NScriptType( aPaM ); > if ( nScriptType == i18n::ScriptType::COMPLEX ) > { > + fprintf(stderr, "COMPLEX\n"); > uno::Reference < i18n::XBreakIterator > _xBI( > ImplGetBreakIterator() ); > sal_Int32 nCount = 1; > lang::Locale aLocale = GetLocale( aPaM ); > @@ -4058,6 +4059,7 @@ sal_Int32 ImpEditEngine::GetChar( > } > else > { > + fprintf(stderr, "NOT COMPLEX\n"); > OUString aStr(pParaPortion->GetNode()->GetString()); > // tdf#102625: don't select middle of a pair of > surrogates with mouse cursor > if (rtl::isSurrogate(aStr[nChar])) > > > I got only NOT COMPLEX appearing on console logs. OK, thanks for the quick check, it was just a thought. I wasn't even sure if Calc's edit engine calls use that GetString(). Just that the same split on mouse cursor selection of a multi-byte glyph had been occuring in the sm Formula editor's input box.
(In reply to Julien Nabet from comment #12) > - click on the end of the cell > - Alt X > => the last glyph disappears and is replaced with "U+1f0a1" Yes I think that is correct. Going from glyph to Unicode value converts just the last character. As compared to going from Unicode notation (i.e. U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1) which is "hungry" and converts the full run back to a white space. It needed to do that to handle combining diacritics.