Description: It seems that some special characters, which are from Unicode Plane 1 and classified by LibreOffice as CTL Text, are not well handeled, and sometimes can even cause crash. Steps to Reproduce: 1. Open the attached ODT file, OR: (a) Open a new writer file. (b) Copy and paste the character "𐍕", "𐊙", or "𐏁" to writer. It doesn't matten whether your system can display these characters or not. (c) Set the font to "Linux Biolinum O" or "Linux Libertine O" to help observation. 2. Use arrow keys to move cursor. 3. When the cursor moves from the characters' left side to right side, there seems to be a space, which is not inputted by us. (see the attached PNG file) 4. Now the cursor is at the red line, press "Enter" key or input anything, and then the character will split into two question marks. (see the attached PNG file) 5. Move the border, make the column smaller, then the character will split, too. 6. The splitting will cause crash sometimes, especially Version 6.0.0.0.alpha1+. Actual Results: 1. The characters split. 2. Writer crashs, especially Version 6.0.0.0.alpha1+. Expected Results: The character should not split or even cause crash. Reproducible: Always User Profile Reset: No Additional Info: Reproducible in the following version: * 3.3.0 (linux) * 4.0.0.1 (linux) * 5.4.2.2 (win/linux) * 6.0.0.0.alpha1+ (linux) Please notice that these characters **must be classified by LibO as CTL text** (see the status bar) to trigger the bug. In version 3.3.0, characters "𐍕", "𐊙", or "𐏁" are depends on your local setting. Which means, if your local is Western (like en_US), then they would be classified as Western Text. While in version 5.4.2.2, they are classified as CTL Text, regardless of your local setting. Calc and Impress are also affected. Just copy and paste "𐍕", "𐊙", or "𐏁" to the text filed, and then press Backspace. User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.75 Chrome/62.0.3202.75 Safari/537.36
Created attachment 137596 [details] demo_file
Created attachment 137597 [details] demo_screenshot
Created attachment 137729 [details] how I see it I've installed Linux Biolinum O but the symbols look different than in your screenshot
The characters issue is reproduced, I got the same result as Hiunn-hué when I performed the test on Windows and I got the same result as Xisco Faulí when I tested on Ubuntu, however no crash occurred. Tested on: • Operating system: Windows 8.1 Pro 64-bits. • LibreOffice: Version: 5.4.3.2 (x64) Build ID: 92a7159f7e4af62137622921e809f8546db437e5 CPU threads: 4; OS: Windows 6.29; UI render: default; Locale: en-US (en_US); Calc: group And also on: • Operating system: Ubuntu 16.04.3 64-bits. • LibreOffice :Version: 5.4.3.2 Build ID: 92a7159f7e4af62137622921e809f8546db437e5 CPU threads: 8; OS: Linux 4.4; UI render: default; VCL: gtk2; Locale: en-US (ja_JP.UTF-8); Calc: group
Thank you for helping testing and confirming. *** I just found that, we must set the Text Langue to Thai to trigger this bug. Other CTL languages are safe. ( Format > Character > CTL Font ) --- @Xisco Faulí It's OK that your system cannot show those symbols. The important part is Step 2 ~ 5. You can still try Step 2 ~ 5 with those Tofu (square with X inside). As described in Step 3, there's an extra space after the characters. Using the "Linux Biolinum O" font is just a way to help us see it clearly. It's not necessary. Sorry for causing the misunderstanding. --- @Mohamed I don't make it always crash, either. Maybe you can try the 6.0.0.0+ daily build?
Hiunn-hué: you could try getting a backtrace of the crash: https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_backtrace For this, use a daily build from https://dev-builds.libreoffice.org/daily/master/ that has -dbg at the end of its name.
Created attachment 137766 [details] gdbtrace.log == Message in Terminal == After ./soffice --backtrace: > ** (soffice:8179): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-GpItp358Lf: Connection refused > warn:vcl:8179:8179:vcl/unx/generic/fontmanager/fontmanager.cxx:702: Could not OpenTTFont "/usr/share/fonts/woff/charis/CharisSIL-B.woff" > warn:vcl:8179:8179:vcl/unx/generic/fontmanager/fontmanager.cxx:702: Could not OpenTTFont "/usr/share/fonts/woff/charis/CharisSIL-BI.woff" > warn:vcl:8179:8179:vcl/unx/generic/fontmanager/fontmanager.cxx:702: Could not OpenTTFont "/usr/share/fonts/woff/charis/CharisSIL-I.woff" > warn:vcl:8179:8179:vcl/unx/generic/fontmanager/fontmanager.cxx:702: Could not OpenTTFont "/usr/share/fonts/woff/charis/CharisSIL-R.woff" > warn:i18nlangtag:8179:8179:i18nlangtag/source/languagetag/languagetag.cxx:1618: LanguageTag::getRegionFromLangtag: pRegionT==NULL for 'en-MED' > warn:i18nlangtag:8179:8179:i18nlangtag/source/languagetag/languagetag.cxx:1618: LanguageTag::getRegionFromLangtag: pRegionT==NULL for 'en-MED' > warn:i18nlangtag:8179:8179:i18nlangtag/source/languagetag/languagetag.cxx:1618: LanguageTag::getRegionFromLangtag: pRegionT==NULL for 'de-med' > warn:i18nlangtag:8179:8179:i18nlangtag/source/languagetag/languagetag.cxx:1618: LanguageTag::getRegionFromLangtag: pRegionT==NULL for 'de-med' > warn:i18nlangtag:8179:8179:i18nlangtag/source/languagetag/languagetag.cxx:1386: LanguageTagImpl::convertLocaleToLang: with bAllowOnTheFlyID invalid 'de-med' > warn:vcl:8179:8179:vcl/unx/generic/fontmanager/fontconfig.cxx:852: In glyph fallback throwing away the language property of en because the detected script for '0x9f3' is Bengali and that language doesn't make sense. Autodetecting instead. Do the steps 2 ~ 5: > warn:ucb.ucp.gio:8179:8179:ucb/source/ucp/gio/gio_content.cxx:393: ignoring GError "Operation not supported" for <> > warn:xmloff:8179:8179:xmloff/source/core/xmlerror.cxx:169: An error or a warning has occurred during XML import/export! > Error-Id: 0x4002000d > Flags: 4 SEVERE > Class: 2 FORMAT > Number: d > Parameters: > 0: office:blue > Exception-Message: Root element unknown > Position: > Public Identifier: > System Identifier: DocumentList.xml > Row, Column: 2,1 > > soffice.bin: /tinderbox/buildslave/source/libo-master/include/rtl/ustring.hxx:669: sal_Unicode rtl::OUString::operator[](sal_Int32) const: Assertion `index >= 0 && static_cast<sal_uInt32>(index) < static_cast<sal_uInt32>(getLength())' failed.
Looks like broken handling of UTF-16 surrogate pairs when the language is set to Thai. I suspect something is broken in the Thai break iterator.
Khaled Hosny committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=5dc52ee00102cbf4262805d6e8f338bf0a88f470 tdf#113694 Fix BreakIterator_CTL surrogate pairs It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Hi Khaled Hosny, Thanks for fixing this. Do you think we should backport it to 6.0 ?
(In reply to Xisco Faulí from comment #10) > Hi Khaled Hosny, > Thanks for fixing this. > Do you think we should backport it to 6.0 ? I tried to backport through Gerrit but there is a merge conflict, I can’t check 6.0 branch locally to try to fix the merge conflict, unfortunately.