150553 – Calc InputBar Unicode SMP characters split at surrogate pair with mouse pointer selection

Bug 150553 - Calc InputBar Unicode SMP characters split at surrogate pair with mouse pointer selection

Summary: Calc InputBar Unicode SMP characters split at surrogate pair with mouse point...

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	5.2.0.4 release
Hardware:	x86-64 (AMD64) All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Font-Rendering
	Show dependency tree / graph

Reported:	2022-08-22 21:20 UTC by Eric Lasota
Modified:	2025-01-27 16:48 UTC (History)
CC List:	6 users (show)

See Also:	102625
Crash report or crash signature:

Attachments
Test document (26 bytes, text/csv) 2022-08-22 21:21 UTC, Eric Lasota	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eric Lasota 2022-08-22 21:20:41 UTC

Description:
Double-clicking a text cell to edit it may put the character in some kind of invalid "mid-character" position that corrupts the text.  See reproduction steps.

Steps to Reproduce:
1. Install LibreOffice 7.4.0.3 on Windows.
2. Open the attached CSV.  It contains 1 cel with 6 consecutive musical note characters.
3. In the import dialog, ensure the character set is set to UTF-8
4. Notice that the cell preview displays the content incorrectly at this step.
5. Click OK
6. Double-click in cell A1 anywhere between two of the note characters.
7. Press the "a" key

Actual Results:
The character preceding the editing carat is replaced with 2 damaged characters and an "a" character between them.  Additionally, the cell edit box above the spreadsheet will no longer match the cell contents.

Expected Results:
The "a" character should be inserted between the note characters where the editing carat indicates.


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Version: 7.4.0.3 (x64) / LibreOffice Community
Build ID: f85e47c08ddd19c015c0114a68350214f7066f5a
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Comment 1 Eric Lasota 2022-08-22 21:21:18 UTC

Created attachment 181960 [details]
Test document

Comment 2 Eric Lasota 2022-08-22 21:22:29 UTC

Forgot to mention: This does not happen if I press the left or right arrow keys to move the editing carat before step 7.

Comment 3 Rafael Lima 2022-08-23 11:18:11 UTC Comment hidden (off-topic)

The sample file you attached has only one single character inside of it. Shouldn't it contain 6 characters?

I opened it in a text editor and it only contains one musical note character.

Could you check if this is the correct file?

Comment 4 Eric Lasota 2022-08-23 16:37:30 UTC Comment hidden (off-topic)

I checked the file and it is correct.  It should be 26 bytes, it contains the byte sequence "F0 9D 85 A0 F0" (U+1D160) six times, followed by "0D 0A" (CR LF)

Downloaded the test document from Bugzilla and opened it on Windows and it works as described in the bug report.

Comment 5 Eric Lasota 2022-08-23 16:39:12 UTC Comment hidden (off-topic)

Er sorry the repeated byte sequence is "F0 9D 85 A0"

Comment 6 QA Administrators 2022-08-24 03:46:20 UTC Comment hidden (obsolete)

[Automated Action] NeedInfo-To-Unconfirmed

Comment 7 Buovjaga 2023-01-27 11:15:00 UTC Comment hidden (off-topic)

(In reply to Eric Lasota from comment #5)
> Er sorry the repeated byte sequence is "F0 9D 85 A0"

I opened the CSV file in Okteta hex editor and confirmed this. It turns out that in Kate text editor as well as LibreOffice Calc, the six note characters are displayed on top of each other! So if you hit End key and start hitting backspace, you will remove the notes one by one. In nano editor and Notepad++ the characters are displayed in sequence.

Bibisected with Linux 5.2 repo to
https://git.libreoffice.org/core/commit/975c833943bab627eb461457ab1df35744b291cd
upgrade harfbuzz version from 0.9.40 to 1.2.6

Not adding regression keyword as this is coming from a dependency.

Comment 8 V Stuart Foote 2023-01-27 15:23:43 UTC

The CSV import is not the issue here. 

Rather mouse pointer selection in a Calc cell or on the Input bar is able to split the highorder pair of SMP glyphs.

Comment 9 V Stuart Foote 2023-01-27 15:32:47 UTC

Paste this string into a Calc cell, unlike the musical notes these glyphs are available in DejaVu Sans

U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1

Position the text cursor at the end of the pasted text.

Enter <Alt>+X and the 🁠 and the 🂡 glyphs will be toggled.

Use mouse cursor to point select on any glyph, in the sheet or in the InputBar.

Type any keyboard character, "a" as in the OP.  The mouse click selection has split the multi-byte codepoint and entering the character "breaks" the glyph.

As noted cursor (<L,R>) movement correctly recognizes the SMP glyphs, just the mouse pointer selection is wrong.

Comment 10 V Stuart Foote 2023-01-27 16:43:14 UTC

@Julien, when you fixed similar for sm for bug 102625 [1] is the Calc instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an i18n COMPLEX script and no rtl::isSurrogate() test gets performed?

=-ref-=
[1] https://gerrit.libreoffice.org/c/core/+/93544

Comment 11 V Stuart Foote 2023-01-27 16:51:29 UTC

(In reply to V Stuart Foote from comment #10)
> @Julien, when you fixed similar for sm for bug 102625 [1] is the Calc
> instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an
> i18n COMPLEX script and no rtl::isSurrogate() test gets performed?
> 
> =-ref-=
> [1] https://gerrit.libreoffice.org/c/core/+/93544

The original patch in master might be better ref where Stephan B. had comments about use of the isSurrogate() test.

https://gerrit.libreoffice.org/c/core/+/93684

Comment 12 Julien Nabet 2023-01-27 17:40:29 UTC

(In reply to V Stuart Foote from comment #10)
> @Julien, when you fixed similar for sm for bug 102625 [1] is the Calc
> instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an
> i18n COMPLEX script and no rtl::isSurrogate() test gets performed?
> 
> =-ref-=
> [1] https://gerrit.libreoffice.org/c/core/+/93544

On pc Debian x86-64 with master sources updated today with gtk3 rendering here what I tested:
- enter cell A1, then Ctrl-shift U 1F060 + Ctrl-shift U 1F060 + Ctrl-shift U 1F060 + Ctrl-shift U 1F060 to have 4 glyphs.
- click on the end of the cell
- Alt X
=> the last glyph disappears and is replaced with "U+1f0a1"


I added some traces on the if else block:
diff --git a/editeng/source/editeng/impedit2.cxx b/editeng/source/editeng/impedit2.cxx
index 4e87e36af5d3..b00a6b8b8f46 100644
--- a/editeng/source/editeng/impedit2.cxx
+++ b/editeng/source/editeng/impedit2.cxx
@@ -4044,6 +4044,7 @@ sal_Int32 ImpEditEngine::GetChar(
                     sal_uInt16 nScriptType = GetI18NScriptType( aPaM );
                     if ( nScriptType == i18n::ScriptType::COMPLEX )
                     {
+                        fprintf(stderr, "COMPLEX\n");
                         uno::Reference < i18n::XBreakIterator > _xBI( ImplGetBreakIterator() );
                         sal_Int32 nCount = 1;
                         lang::Locale aLocale = GetLocale( aPaM );
@@ -4058,6 +4059,7 @@ sal_Int32 ImpEditEngine::GetChar(
                     }
                     else
                     {
+                        fprintf(stderr, "NOT COMPLEX\n");
                         OUString aStr(pParaPortion->GetNode()->GetString());
                         // tdf#102625: don't select middle of a pair of surrogates with mouse cursor
                         if (rtl::isSurrogate(aStr[nChar]))


I got only NOT COMPLEX appearing on console logs.

Comment 13 V Stuart Foote 2023-01-27 17:53:39 UTC

(In reply to Julien Nabet from comment #12)

> 
> I added some traces on the if else block:
> diff --git a/editeng/source/editeng/impedit2.cxx
> b/editeng/source/editeng/impedit2.cxx
> index 4e87e36af5d3..b00a6b8b8f46 100644
> --- a/editeng/source/editeng/impedit2.cxx
> +++ b/editeng/source/editeng/impedit2.cxx
> @@ -4044,6 +4044,7 @@ sal_Int32 ImpEditEngine::GetChar(
>                      sal_uInt16 nScriptType = GetI18NScriptType( aPaM );
>                      if ( nScriptType == i18n::ScriptType::COMPLEX )
>                      {
> +                        fprintf(stderr, "COMPLEX\n");
>                          uno::Reference < i18n::XBreakIterator > _xBI(
> ImplGetBreakIterator() );
>                          sal_Int32 nCount = 1;
>                          lang::Locale aLocale = GetLocale( aPaM );
> @@ -4058,6 +4059,7 @@ sal_Int32 ImpEditEngine::GetChar(
>                      }
>                      else
>                      {
> +                        fprintf(stderr, "NOT COMPLEX\n");
>                          OUString aStr(pParaPortion->GetNode()->GetString());
>                          // tdf#102625: don't select middle of a pair of
> surrogates with mouse cursor
>                          if (rtl::isSurrogate(aStr[nChar]))
> 
> 
> I got only NOT COMPLEX appearing on console logs.

OK, thanks for the quick check, it was just a thought. I wasn't even sure if Calc's edit engine calls use that GetString(). Just that the same split on mouse cursor selection of a multi-byte glyph had been occuring in the sm Formula editor's input box.

Comment 14 V Stuart Foote 2023-01-27 18:35:52 UTC Comment hidden (off-topic)

(In reply to Julien Nabet from comment #12)

> - click on the end of the cell
> - Alt X
> => the last glyph disappears and is replaced with "U+1f0a1"

Yes I think that is correct. Going from glyph to Unicode value converts just the last character. As compared to going from Unicode notation (i.e. U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1) which is "hungry" and converts the full run back to a white space.  It needed to do that to handle combining diacritics.

Comment 15 QA Administrators 2025-01-27 03:10:54 UTC Comment hidden (obsolete)

Dear Eric Lasota,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword

Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug

Comment 16 Eric Lasota 2025-01-27 16:48:28 UTC

Re-tested with latest version:

Version: 24.8.4.2 (X86_64) / LibreOffice Community
Build ID: bb3cfa12c7b1bf994ecc5649a80400d06cd71002
CPU threads: 12; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded


Issue still occurs.