Bug 150553 - Calc InputBar Unicode SMP characters split at surrogate pair with mouse pointer selection
Summary: Calc InputBar Unicode SMP characters split at surrogate pair with mouse point...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.2.0.4 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2022-08-22 21:20 UTC by Eric Lasota
Modified: 2023-01-27 18:35 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Test document (26 bytes, text/csv)
2022-08-22 21:21 UTC, Eric Lasota
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Lasota 2022-08-22 21:20:41 UTC
Description:
Double-clicking a text cell to edit it may put the character in some kind of invalid "mid-character" position that corrupts the text.  See reproduction steps.

Steps to Reproduce:
1. Install LibreOffice 7.4.0.3 on Windows.
2. Open the attached CSV.  It contains 1 cel with 6 consecutive musical note characters.
3. In the import dialog, ensure the character set is set to UTF-8
4. Notice that the cell preview displays the content incorrectly at this step.
5. Click OK
6. Double-click in cell A1 anywhere between two of the note characters.
7. Press the "a" key

Actual Results:
The character preceding the editing carat is replaced with 2 damaged characters and an "a" character between them.  Additionally, the cell edit box above the spreadsheet will no longer match the cell contents.

Expected Results:
The "a" character should be inserted between the note characters where the editing carat indicates.


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Version: 7.4.0.3 (x64) / LibreOffice Community
Build ID: f85e47c08ddd19c015c0114a68350214f7066f5a
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 1 Eric Lasota 2022-08-22 21:21:18 UTC
Created attachment 181960 [details]
Test document
Comment 2 Eric Lasota 2022-08-22 21:22:29 UTC
Forgot to mention: This does not happen if I press the left or right arrow keys to move the editing carat before step 7.
Comment 3 Rafael Lima 2022-08-23 11:18:11 UTC Comment hidden (off-topic)
Comment 4 Eric Lasota 2022-08-23 16:37:30 UTC Comment hidden (off-topic)
Comment 5 Eric Lasota 2022-08-23 16:39:12 UTC Comment hidden (off-topic)
Comment 6 QA Administrators 2022-08-24 03:46:20 UTC Comment hidden (obsolete)
Comment 7 Buovjaga 2023-01-27 11:15:00 UTC Comment hidden (off-topic)
Comment 8 V Stuart Foote 2023-01-27 15:23:43 UTC
The CSV import is not the issue here. 

Rather mouse pointer selection in a Calc cell or on the Input bar is able to split the highorder pair of SMP glyphs.
Comment 9 V Stuart Foote 2023-01-27 15:32:47 UTC
Paste this string into a Calc cell, unlike the musical notes these glyphs are available in DejaVu Sans

U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1U+1F060U+1F0A1

Position the text cursor at the end of the pasted text.

Enter <Alt>+X and the 🁠 and the 🂡 glyphs will be toggled.

Use mouse cursor to point select on any glyph, in the sheet or in the InputBar.

Type any keyboard character, "a" as in the OP.  The mouse click selection has split the multi-byte codepoint and entering the character "breaks" the glyph.

As noted cursor (<L,R>) movement correctly recognizes the SMP glyphs, just the mouse pointer selection is wrong.
Comment 10 V Stuart Foote 2023-01-27 16:43:14 UTC
@Julien, when you fixed similar for sm for bug 102625 [1] is the Calc instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an i18n COMPLEX script and no rtl::isSurrogate() test gets performed?

=-ref-=
[1] https://gerrit.libreoffice.org/c/core/+/93544
Comment 11 V Stuart Foote 2023-01-27 16:51:29 UTC
(In reply to V Stuart Foote from comment #10)
> @Julien, when you fixed similar for sm for bug 102625 [1] is the Calc
> instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an
> i18n COMPLEX script and no rtl::isSurrogate() test gets performed?
> 
> =-ref-=
> [1] https://gerrit.libreoffice.org/c/core/+/93544

The original patch in master might be better ref where Stephan B. had comments about use of the isSurrogate() test.

https://gerrit.libreoffice.org/c/core/+/93684
Comment 12 Julien Nabet 2023-01-27 17:40:29 UTC
(In reply to V Stuart Foote from comment #10)
> @Julien, when you fixed similar for sm for bug 102625 [1] is the Calc
> instance of the SMP glyphs missed here bcz the SMP glyphs are treated as an
> i18n COMPLEX script and no rtl::isSurrogate() test gets performed?
> 
> =-ref-=
> [1] https://gerrit.libreoffice.org/c/core/+/93544

On pc Debian x86-64 with master sources updated today with gtk3 rendering here what I tested:
- enter cell A1, then Ctrl-shift U 1F060 + Ctrl-shift U 1F060 + Ctrl-shift U 1F060 + Ctrl-shift U 1F060 to have 4 glyphs.
- click on the end of the cell
- Alt X
=> the last glyph disappears and is replaced with "U+1f0a1"


I added some traces on the if else block:
diff --git a/editeng/source/editeng/impedit2.cxx b/editeng/source/editeng/impedit2.cxx
index 4e87e36af5d3..b00a6b8b8f46 100644
--- a/editeng/source/editeng/impedit2.cxx
+++ b/editeng/source/editeng/impedit2.cxx
@@ -4044,6 +4044,7 @@ sal_Int32 ImpEditEngine::GetChar(
                     sal_uInt16 nScriptType = GetI18NScriptType( aPaM );
                     if ( nScriptType == i18n::ScriptType::COMPLEX )
                     {
+                        fprintf(stderr, "COMPLEX\n");
                         uno::Reference < i18n::XBreakIterator > _xBI( ImplGetBreakIterator() );
                         sal_Int32 nCount = 1;
                         lang::Locale aLocale = GetLocale( aPaM );
@@ -4058,6 +4059,7 @@ sal_Int32 ImpEditEngine::GetChar(
                     }
                     else
                     {
+                        fprintf(stderr, "NOT COMPLEX\n");
                         OUString aStr(pParaPortion->GetNode()->GetString());
                         // tdf#102625: don't select middle of a pair of surrogates with mouse cursor
                         if (rtl::isSurrogate(aStr[nChar]))


I got only NOT COMPLEX appearing on console logs.
Comment 13 V Stuart Foote 2023-01-27 17:53:39 UTC
(In reply to Julien Nabet from comment #12)

> 
> I added some traces on the if else block:
> diff --git a/editeng/source/editeng/impedit2.cxx
> b/editeng/source/editeng/impedit2.cxx
> index 4e87e36af5d3..b00a6b8b8f46 100644
> --- a/editeng/source/editeng/impedit2.cxx
> +++ b/editeng/source/editeng/impedit2.cxx
> @@ -4044,6 +4044,7 @@ sal_Int32 ImpEditEngine::GetChar(
>                      sal_uInt16 nScriptType = GetI18NScriptType( aPaM );
>                      if ( nScriptType == i18n::ScriptType::COMPLEX )
>                      {
> +                        fprintf(stderr, "COMPLEX\n");
>                          uno::Reference < i18n::XBreakIterator > _xBI(
> ImplGetBreakIterator() );
>                          sal_Int32 nCount = 1;
>                          lang::Locale aLocale = GetLocale( aPaM );
> @@ -4058,6 +4059,7 @@ sal_Int32 ImpEditEngine::GetChar(
>                      }
>                      else
>                      {
> +                        fprintf(stderr, "NOT COMPLEX\n");
>                          OUString aStr(pParaPortion->GetNode()->GetString());
>                          // tdf#102625: don't select middle of a pair of
> surrogates with mouse cursor
>                          if (rtl::isSurrogate(aStr[nChar]))
> 
> 
> I got only NOT COMPLEX appearing on console logs.

OK, thanks for the quick check, it was just a thought. I wasn't even sure if Calc's edit engine calls use that GetString(). Just that the same split on mouse cursor selection of a multi-byte glyph had been occuring in the sm Formula editor's input box.
Comment 14 V Stuart Foote 2023-01-27 18:35:52 UTC Comment hidden (off-topic)