Bug 59679 - Wrong window title encoding of "New Database" with GTK file picker and POSIX locale
Summary: Wrong window title encoding of "New Database" with GTK file picker and POSIX ...
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
(earliest affected) rc
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
Depends on:
Reported: 2013-01-21 21:15 UTC by lacyc3
Modified: 2015-05-03 12:39 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:

Example (48.32 KB, image/png)
2013-01-21 21:15 UTC, lacyc3
New database (2.28 KB, application/vnd.oasis.opendocument.base)
2013-01-22 21:03 UTC, lacyc3

Note You need to log in before you can comment on or make changes to this bug.
Description lacyc3 2013-01-21 21:15:09 UTC
Created attachment 73408 [details]

I created a new database and kept its default name "új adatbázis"(hungarian) / new database.
Comment 1 Joel Madero 2013-01-21 22:14:30 UTC
Please attach the actual database so QA can open it to see if we get same results.

Marking as NEEDINFO, please reopen as UNCONFIRMED once you have attached the document.

Also, if there are any other reproducible steps, please include them in an easy to follow way:


Comment 2 Stephan Bergmann 2013-01-22 09:29:33 UTC
What Linux distro?

What desktop env, KDE, Gnome, or something else?

What is the output of "locale" in a shell where you start "soffice"?
Comment 3 lacyc3 2013-01-22 21:03:51 UTC
Created attachment 73477 [details]
New database

I had to rename it from "új adatbázis" to new_database, becouse I can't upload it.
Comment 4 lacyc3 2013-01-22 21:07:42 UTC
Linux distro: Gentoo
Desktop environment: Gnome 2.32
Libreoffice version:
Use flags: branding cups dbus gnome gstreamer gtk gtk3 java libreoffice_extensions_nlpsolver libreoffice_extensions_presenter-minimizer libreoffice_extensions_wiki-publisher mysql opengl python_single_target_python2_7 python_targets_python2_7 telepathy vba webdav -aqua -bluetooth -debug -eds -elibc_FreeBSD -jemalloc -kde -libreoffice_extensions_scripting-beanshell -libreoffice_extensions_scripting-javascript -nsplugin -odk -postgres -python_single_target_python3_3 -python_targets_python3_3 -test

Locale output:
Comment 5 Stephan Bergmann 2013-01-23 08:20:29 UTC
I can reproduce this (Fedora 18, Gnome 3.6.2) when running soffice from an environment with LC_ALL=POSIX.

For a workaround, try setting LC_ALL to something like en_US.utf8.
Comment 6 Stephan Bergmann 2013-01-23 14:12:10 UTC
This only happens when using a GTK VCL plugin (SAL_USE_VCLPLUGIN=gtk or SAL_USE_VCLPLUGIN=gtk3) and having "Tools - Options... - LibreOffice - General - Open/Save dialogs - Use LibreOffice dialogs" unticked.

This is not easily fixable in the LO code.  When saving a newly created database, LO passes the suggested filename ("új adatbázis") to the GTK file chooser's gtk_file_chooser_set_current_name as a UTF-8 encoded string (SalGtkFilePicker::setDefaultName in vcl/unx/gtk/fpicker/SalGtkFilePicker.cxx).  The filename that is ultimately chosen by the user is passed back from the GTK file chooser to LO via gtk_file_chooser_get_uris (SalGtkFilePicker::getSelectedFiles in vcl/unx/gtk/fpicker/SalGtkFilePicker.cxx) as a file URL ("file:///.../%C3%BAj adatb%C3%A1zis").

When the G_FILENAME_ENCODING environment variable is unset, GLib assumes that pathnames (which are just sequences of 8-bit bytes after all) use UTF-8 encoding, so the pathname that the GTK file chooser computes for the suggested filename would, in C string notation (where "\XX" denotes a byte with hexadecimal value XX), end in ".../\xC3\xBAj adatb\xC3\xA1zis".  As GLib apparently represents file URLs (whose "path payload" are just sequences of 8-bit bytes after all) with an identity-mapping between the bytes of the pathname and the bytes encoded (via percent-encoding) in the URL's path, the URL returned from gtk_file_chooser_get_uris above reads "file:///.../%C3%BAj adatb%C3%A1zis".

Now, LO internally uses a different representation of pathnames as file URLs, where the "payload bytes" in the URL's path are interpreted as UTF-8, but the bytes in the pathname are interpreted according to the system locale's encoding (see osl_getThreadTextEncoding), so there is a translation between those two text encodings involved.

There is SalGtkPicker::uritounicode and SalGtkPicker::unicodetouri in vcl/unx/gtk/fpicker/SalGtkPicker.cxx to convert between the different interpretations of file URLs in LO and GLib, and for communication about pre-existing files they appear to work reasonably well (esp. in the common case where the system locale's encoding is UTF-8).  However, they break down in the above scenario of communication about a not-yet-existing file (passing a filename string that is always UTF-8 encoded in one direction via gtk_file_chooser_set_current_name, but getting back a URL via gtk_file_chooser_get_uris) when the system locale's encoding is not UTF-8.  In that case, SalGtkPicker::uritounicode assumes the "payload bytes" of its input URL's path ("file:///.../%C3%BAj adatb%C3%A1zis") should be treated according to the system locale's encoding (which is plain 7-bit ASCII for the POSIX locale; but any bytes with the high bit set are effectively treated as ISO-8859-1 by LO then), so is converted by LO into its internal file URL format as "file:///.../%C3%83%C2%BAj adatb%C3%83%C2%A1zis".  The result is that the pathname of the file that LO will create on disk is (in C string notation) ".../\xC3\xBAj adatb\xC3\xA1zis" (so will decode to "új adatbázis" when viewed with a tool that assumes the pathname bytes are UTF-8, like the GTK file chooser), but the title that LO will display for it contains those odd "Ä", U+00BA, "Ä", and "¡".

Probably the best way out is to use a system locale with an UTF-8 encoding (hu_HU.utf8, say), or at least set the G_FILENAME_ENCODING environment variable to fix GLib's assumptions (via G_FILENAME_ENCODING=@locale, see <http://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html#file-name-encodings> and <http://developer.gnome.org/glib/stable/glib-running.html#G_FILENAME_ENCODING>).
Comment 7 Joel Madero 2013-01-23 19:08:19 UTC
*** Bug 59612 has been marked as a duplicate of this bug. ***
Comment 8 Alex Thurgood 2015-01-03 17:39:36 UTC
Adding self to CC if not already on
Comment 9 Michael Meeks 2015-05-03 12:39:42 UTC
Non-UTF8 encodings are a nightmare on Linux; the Linux desktop as a whole is pretty firmly moved to UTF-8 everywhere these days; I'm not convinced this is worth trying to work around even =) With a UTF-8 encoding a -ton- of this sort of stuff will start working for you across the desktop, and you won't be left with a zoo of un-detectable 8bit charsets everywhere - heartily recommended =)

Thanks for filing though !