Created attachment 69044 [details] The top portion is a screenshot of the actual file name as displayed in the document. The bottom portion is how it is displayed on the top bar. I am transcribing a story into the Shavian alphabet, a phonemic English spelling reform proposal which is encoded in Unicode beyond the Basic Multilingual Plane. I have the file saved as the characters in question, and while I am perfectly happy with the series of undefined characters that Windows Explorer displays, LibreOffice writer shows extended Cyrillic characters in the top bar. To reproduce, just save a file and give it a filename with characters beyond the BMP. Browser: Latest version of Google Chrome
@ejectmail@me.com: Thank you for your report – unfortunately important information is missing. May be hints on <http://wiki.documentfoundation.org/BugReport> will help you to find out what information will be useful to reproduce your problem? If you believe that that is really sophisticated please as for Help on a user mailing list Please: - attach a sample.odt containing (only) such a file name as text contents. - Contribute a document related step by step instruction containing every key press and every mouse click how to reproduce your problem (similar to example in Bug 43431) – if possible contribute an instruction how to create a sample document from the scratch - add information -- concerning your OS (Version, Distribution, Language) -- concerning your LibO localization (UI language, Locale setting) –- Libo settings that might be related to your problems (video hardware acceleration ...) -- how you launch LibO and how you opened the sample document -- everything else crossing your mind after you read linked texts
@ Urmas: Thank you for confirming this bug report! However, please do not confirm a bug simply by setting the Status field to NEW (and changing the Summary). Please add always a short comment saying *that* and *how* you did reproduce the issue (on which platform? in which LibO version(s)? by which steps?). Without such an additional comment, the report looks as if it was “confirmed” only by the original reporter himself, but never by an independent reviewer (some users just set the Status of their own bug reports to NEW). And this is something we all don’t like, do we? So please add a short comment when you confirm a bug ... Thank you very much!
Confirmed in 3.6.2 in Windows XP.
(In reply to comment #3) > Confirmed in 3.6.2 in Windows XP. Thank you! Also REPROCUIBLE on Mac OS X 10.6.8 (Intel) with LibreOffice 3.6.3.1. E.g., the filename "Test for bug 56366, special glyphs (𐄷𐄸𐄺𐄸𐄻𐄿).odt" is displayed as "Test for bug 56366, special glyphs (ķĺĸĻĿ).odt"; i.e., the sequence is displayed as 10137 (D800+DD37) 0137 10138 (D800+DD38) 0138 1013A (D800+DD3A) 013A 10137 (D800+DD38) 0137 1013B (D800+DD3B) 013B 1013F (D800+DD3F) 013F This seems very clear. So (if I have not forgot too much about mathematics) LibreOffice just does a modulo 0x10000 operation on all characters, so that all characters beyond the BMP are mapped to the BMP.
Already REPRODUCIBLE in LibreOffice 3.3.0 (and 3.4.0 and 3.5.0) with exactly the same result. -> Adapted “Version” field. Also reproducible in Impress and Calc with exactly the same results, so -> a general UI bug (adapted “Component” field).
@ Jan Holesovsky, Ivan Timofeev Hi Jan and Ivan, you have solved quite some UI issues. Could you please take a look at this little, but annoying issue and try to find out if it is possible to improve the current behaviour? Or, if some other developer could handle this? I hope it is possible to fix this, because at least in the main text of a Writer document LibreOffice handles the same characters which I used above (comment #4: 𐄷𐄸𐄺𐄸𐄻𐄿) quite correctly, given that you have a font which contains glyphs for these characters. So LibreOffice, at least Writer, actually *can* handle Unicode characters beyond the BMP ... This is also confirmed by comment #0. Thank you very much!
I found (at least one) place where characters get clipped: TitleHelper::impl_convertURL2Title http://opengrok.libreoffice.org/xref/core/framework/source/fwe/helper/titlehelper.cxx#impl_convertURL2Title which uses INetURLObject, and it clips to 16bit, in our case it is line 3727: aResult.append(sal_Unicode(nUTF32)); http://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx#3727 So, if I change it to use the proper conversion: aResult.append(OUString(&nUTF32, 1)); the title is alright. Stephan, it that OK to teach INetURLObject::decode the full UTF-16 support?
(In reply to comment #7) > I found (at least one) place where characters get clipped: > TitleHelper::impl_convertURL2Title > http://opengrok.libreoffice.org/xref/core/framework/source/fwe/helper/ > titlehelper.cxx#impl_convertURL2Title > which uses INetURLObject, and it clips to 16bit, in our case it is line 3727: > aResult.append(sal_Unicode(nUTF32)); > http://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx#3727 > > So, if I change it to use the proper conversion: > aResult.append(OUString(&nUTF32, 1)); > the title is alright. > > Stephan, > it that OK to teach INetURLObject::decode the full UTF-16 support? That looks like a bug indeed. There is OUStringBuffer.appendUtf32, so the best fix appears to be to change both occurrences of aResult.append(sal_Unicode(nUTF32)); in INetURLObject::decode with aResult.appendUtf32(nUTF32);
OK, I have committed that. Thank you Stephan!
Ivan Timofeev committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=cda6b4e991f45ec870a311ab736038bd93227900 fdo#56366: INetURLObject::decode: do not clip utf-32 to 16-bit The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Created attachment 69324 [details] Screenshot showing bug 56366 fixed in LOdev 2012-10-30 on Mac OS X VERIFIED as FIXED with LOdev 3.7.0.0.alpha0+ (Build ID: ce2690; pull time: 2012-10-30 00:06:37) on Mac OS X 10.6.8. The same .odt file which showed a wrong title in older LibreOffice/LOdev versions (see comment #4) now shows a correct window title when opened with the newest Master (LOdev) daily build -- see attached screenshot. @ Ivan Timofeev: Thank you very much for fixing this issue!
Wow! I expected this bug to languish in obscurity forever; instead it's been fixed within five days. Thank you all so much! (I think, however, I'll wait until the next stable release to download the fix.)
It seems this bug isn't fixed after all. I just installed version 3.6.3.2 yesterday and I'm still getting BMP characters in the title bar.
(In reply to comment #13) > It seems this bug isn't fixed after all. I just installed version 3.6.3.2 > yesterday and I'm still getting BMP characters in the title bar. Well, Ivan’s patch was pushed only to the Master branch; so the bug is fixed, of course, in the master (3.7) builds, but not in any 3.6 builds. @ Ivan Timofeev, Stephan Bergmann: Do you think it is save to backport the fix to the 3.6.x branch? Then it would be very nice if you could do so ;-) Thank you!
Ivan Timofeev committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=eb64c9d69ea7c2677773e7a29634148151102bb7&g=libreoffice-3-6 fdo#56366: INetURLObject::decode: do not clip utf-32 to 16-bit It will be available in LibreOffice 3.6.4. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Just installed version 3.6.4, and now I'm getting a title bar full of undefined characters. It's better than Cyrillic. Thank you!