Bug 56366 - Titlebar clips Unicode characters from file name to 16 bits
Summary: Titlebar clips Unicode characters from file name to 16 bits
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: UI (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA target:3.7.0 target:3.6.4
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-24 19:18 UTC by Linus Drumbler
Modified: 2013-01-21 20:14 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
The top portion is a screenshot of the actual file name as displayed in the document. The bottom portion is how it is displayed on the top bar. (16.52 KB, image/png)
2012-10-24 19:18 UTC, Linus Drumbler
Details
Screenshot showing bug 56366 fixed in LOdev 2012-10-30 on Mac OS X (104.49 KB, image/png)
2012-10-30 18:10 UTC, Roman Eisele
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Linus Drumbler 2012-10-24 19:18:26 UTC
Created attachment 69044 [details]
The top portion is a screenshot of the actual file name as displayed in the document. The bottom portion is how it is displayed on the top bar.

I am transcribing a story into the Shavian alphabet, a phonemic English spelling reform proposal which is encoded in Unicode beyond the Basic Multilingual Plane. I have the file saved as the characters in question, and while I am perfectly happy with the series of undefined characters that Windows Explorer displays, LibreOffice writer shows extended Cyrillic characters in the top bar.

To reproduce, just save a file and give it a filename with characters beyond the BMP.
              
Browser: Latest version of Google Chrome
Comment 1 Rainer Bielefeld Retired 2012-10-25 05:02:13 UTC
@ejectmail@me.com:
Thank you for your report – unfortunately important information is missing.
May be hints on <http://wiki.documentfoundation.org/BugReport> will help you to find out what information will be useful to reproduce your problem? If you believe that that  is really sophisticated please as for Help on a user mailing list
Please:
- attach a sample.odt containing (only) such a file name as text contents.
- Contribute a document related step by step instruction containing every 
  key press and every mouse click how to reproduce your problem 
  (similar to example in Bug 43431)
– if possible contribute an instruction how to create a sample document 
  from the scratch
- add information 
  -- concerning your OS (Version, Distribution, Language)
  -- concerning your LibO localization (UI language, Locale setting)
  –- Libo settings that might be related to your problems 
    (video hardware acceleration ...)
  -- how you launch LibO and how you opened the sample document
  -- everything else crossing your mind after you read linked texts
Comment 2 Roman Eisele 2012-10-25 12:58:05 UTC
@ Urmas:

Thank you for confirming this bug report!

However, please do not confirm a bug simply by setting the Status field to NEW (and changing the Summary). Please add always a short comment saying *that* and *how* you did reproduce the issue (on which platform? in which LibO version(s)? by which steps?).

Without such an additional comment, the report looks as if it was “confirmed” only by the original reporter himself, but never by an independent reviewer
(some users just set the Status of their own bug reports to NEW). And this is something we all don’t like, do we?

So please add a short comment when you confirm a bug ... Thank you very much!
Comment 3 Urmas 2012-10-25 14:05:28 UTC
Confirmed in 3.6.2 in Windows XP.
Comment 4 Roman Eisele 2012-10-25 14:45:05 UTC
(In reply to comment #3)
> Confirmed in 3.6.2 in Windows XP.

Thank you!

Also REPROCUIBLE on Mac OS X 10.6.8 (Intel) with LibreOffice 3.6.3.1.

E.g., the filename "Test for bug 56366, special glyphs (𐄷𐄸𐄺𐄸𐄻𐄿).odt"
is displayed as    "Test for bug 56366, special glyphs (ķĺĸĻĿ).odt";
i.e., the sequence      is displayed as
  10137 (D800+DD37)     0137
  10138 (D800+DD38)     0138
  1013A (D800+DD3A)     013A
  10137 (D800+DD38)     0137
  1013B (D800+DD3B)     013B
  1013F (D800+DD3F)     013F

This seems very clear. So (if I have not forgot too much about mathematics)
LibreOffice just does a modulo 0x10000 operation on all characters,
so that all characters beyond the BMP are mapped to the BMP.
Comment 5 Roman Eisele 2012-10-25 14:49:41 UTC
Already REPRODUCIBLE in LibreOffice 3.3.0 (and 3.4.0 and 3.5.0) with exactly the same result. -> Adapted “Version” field.

Also reproducible in Impress and Calc with exactly the same results, so
-> a general UI bug (adapted “Component” field).
Comment 6 Roman Eisele 2012-10-25 15:00:24 UTC
@ Jan Holesovsky, Ivan Timofeev

Hi Jan and Ivan,

you have solved quite some UI issues. Could you please take a look at this little, but annoying issue and try to find out if it is possible to improve the current behaviour? Or, if some other developer could handle this?

I hope it is possible to fix this, because at least in the main text of a Writer document LibreOffice handles the same characters which I used above (comment #4: 𐄷𐄸𐄺𐄸𐄻𐄿) quite correctly, given that you have a font which contains glyphs for these characters. So LibreOffice, at least Writer, actually *can* handle Unicode characters beyond the BMP ... This is also confirmed by comment #0.

Thank you very much!
Comment 7 Ivan Timofeev (retired) 2012-10-29 08:18:55 UTC
I found (at least one) place where characters get clipped:
 TitleHelper::impl_convertURL2Title
http://opengrok.libreoffice.org/xref/core/framework/source/fwe/helper/titlehelper.cxx#impl_convertURL2Title
which uses INetURLObject, and it clips to 16bit, in our case it is line 3727:
 aResult.append(sal_Unicode(nUTF32));
http://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx#3727

So, if I change it to use the proper conversion:
 aResult.append(OUString(&nUTF32, 1));
the title is alright.

Stephan,
it that OK to teach INetURLObject::decode the full UTF-16 support?
Comment 8 Stephan Bergmann 2012-10-29 08:46:05 UTC
(In reply to comment #7)
> I found (at least one) place where characters get clipped:
>  TitleHelper::impl_convertURL2Title
> http://opengrok.libreoffice.org/xref/core/framework/source/fwe/helper/
> titlehelper.cxx#impl_convertURL2Title
> which uses INetURLObject, and it clips to 16bit, in our case it is line 3727:
>  aResult.append(sal_Unicode(nUTF32));
> http://opengrok.libreoffice.org/xref/core/tools/source/fsys/urlobj.cxx#3727
> 
> So, if I change it to use the proper conversion:
>  aResult.append(OUString(&nUTF32, 1));
> the title is alright.
> 
> Stephan,
> it that OK to teach INetURLObject::decode the full UTF-16 support?

That looks like a bug indeed.  There is OUStringBuffer.appendUtf32, so the best fix appears to be to change both occurrences of

  aResult.append(sal_Unicode(nUTF32));

in INetURLObject::decode with

  aResult.appendUtf32(nUTF32);
Comment 9 Ivan Timofeev (retired) 2012-10-29 10:24:54 UTC
OK, I have committed that. Thank you Stephan!
Comment 10 Not Assigned 2012-10-29 10:27:03 UTC
Ivan Timofeev committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=cda6b4e991f45ec870a311ab736038bd93227900

fdo#56366: INetURLObject::decode: do not clip utf-32 to 16-bit



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Roman Eisele 2012-10-30 18:10:27 UTC
Created attachment 69324 [details]
Screenshot showing bug 56366 fixed in LOdev 2012-10-30 on Mac OS X



VERIFIED as FIXED with LOdev 3.7.0.0.alpha0+ (Build ID: ce2690; pull time: 2012-10-30 00:06:37) on Mac OS X 10.6.8.

The same .odt file which showed a wrong title in older LibreOffice/LOdev versions (see comment #4) now shows a correct window title when opened with the newest Master (LOdev) daily build -- see attached screenshot.


@ Ivan Timofeev:
Thank you very much for fixing this issue!
Comment 12 Linus Drumbler 2012-10-30 19:40:01 UTC
Wow! I expected this bug to languish in obscurity forever; instead it's been fixed within five days. Thank you all so much! (I think, however, I'll wait until the next stable release to download the fix.)
Comment 13 Linus Drumbler 2012-11-03 16:04:38 UTC
It seems this bug isn't fixed after all. I just installed version 3.6.3.2 yesterday and I'm still getting BMP characters in the title bar.
Comment 14 Roman Eisele 2012-11-04 11:33:03 UTC
(In reply to comment #13)
> It seems this bug isn't fixed after all. I just installed version 3.6.3.2
> yesterday and I'm still getting BMP characters in the title bar.

Well, Ivan’s patch was pushed only to the Master branch; so the bug is fixed, of course, in the master (3.7) builds, but not in any 3.6 builds.


@ Ivan Timofeev, Stephan Bergmann:
Do you think it is save to backport the fix to the 3.6.x branch?
Then it would be very nice if you could do so ;-) Thank you!
Comment 15 Not Assigned 2012-11-05 08:26:09 UTC
Ivan Timofeev committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=eb64c9d69ea7c2677773e7a29634148151102bb7&g=libreoffice-3-6

fdo#56366: INetURLObject::decode: do not clip utf-32 to 16-bit


It will be available in LibreOffice 3.6.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 Linus Drumbler 2013-01-21 20:14:28 UTC
Just installed version 3.6.4, and now I'm getting a title bar full of undefined characters. It's better than Cyrillic. Thank you!