Bug 125971 - Image import in KDE LibreOffice fails if filename contains non-latin characters
Summary: Image import in KDE LibreOffice fails if filename contains non-latin characters
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.2.4.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Stephan Bergmann
URL:
Whiteboard: target:6.4.0 target:6.3.0.2
Keywords:
Depends on:
Blocks: KDE
  Show dependency treegraph
 
Reported: 2019-06-17 23:30 UTC by Piotr
Modified: 2022-08-25 17:14 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Piotr 2019-06-17 23:30:32 UTC
Description:
When I try to import an image into LibreOffice and the filename of that image contains non-latin characters like (I tested with Polish characters), I get an error window saying "Nonexistent object. Nonexistent file." and "qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 4541, resource id: 37810746, major code: 40 (TranslateCoords), minor code: 0" in console (in Impress) or "Image file cannot be opened" and "kf5.kio.widgets: No node found for item that was just removed: QUrl("file:///home/piotr/Dokumenty/Prezentacja MCHTR v2 %3F%3F.odp")" (in Writer).

Steps to Reproduce:
1.Input any image with a non-latin filename, e.g "łąka.png" into Impress presentation or Writer document.

Actual Results:
Error window saying "Nonexistent object. Nonexistent file." (in Impress) or error window saying "Image file cannot be opened" (in Writer) 

Expected Results:
Image placed in the presentation/document. 


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
I have not tested the behavior with inputting files other than images or into other LO components than Impress and Writer. I also tested only with Polish characters in filenames, however with multiple files. Setting LO locale to Polish didn't resolve the problem.

System wise, I have installed all (for Polish and English) available language packages and all format locales are Polish, however all my display locales are English (makes it easier to solve all sorts of problems using a search engine). 

My system:
Operating System: Manjaro Linux KDE Plasma Version: 5.15.5 KDE Frameworks Version: 5.59.0 Qt Version: 5.12.3 Kernel Version: 5.1.8-1-MANJARO OS Type: 64-bit Processors: 12 × AMD Ryzen 5 1600 Six-Core Processor Memory: 31,4 GiB

LibreOffice
Version: 6.2.4.2.0+ Build ID: 6.2.4-1 CPU threads: 12; OS: Linux 5.1; UI render: default; VCL: kde5; Locale: en-US (C); UI-Language: en-US Calc: CL
Comment 1 Piotr 2019-06-18 09:54:43 UTC
Reinstalling the LO (either fresh or still) does not resolve the problem. The problem seems to be specific to my setup, two other PCs in my work, with basically identical setup, input images as expected. The workaround for me was to install the LO snap package: https://snapcraft.io/libreoffice
Comment 2 Jan-Marek Glogowski 2019-06-18 11:51:20 UTC
This is very likely a duplicate of bug 125498, but I'm not 100% sure, since the other bug contains a mixture of strange effects.

(In reply to Piotr from comment #0)
> VCL: kde5; Locale: en-US (C); UI-Language: en-US Calc: CL

Thee good thing about this one is that I can reproduce this, if I start LO with LANG=C, as you do and try to insert a non-ASCII named image, like your "łąka.png".

There is some problem with the encoding LO expects from the file picker URLs. I thought it's simple to fix, but I don't understand the encoding yet.

The QUrl from the file picker is definitely correct UTF16 (on dump the name is correctly displayed in the terminal), but LO expects something else, obviously, as that's the bug. And dumping the value from SalGtkFilePicker::getSelectedFiles OUString actually results in glibberish - WTF?!
Comment 3 tantrido 2019-06-18 11:54:34 UTC
Yes, happen sometimes with KDE. Could be kde bug as I saw these ???????? marks in file names in file copy dialog as well: see linked issues:

https://bugs.documentfoundation.org/show_bug.cgi?id=125498
https://bugs.documentfoundation.org/show_bug.cgi?id=125482
Comment 4 Stephan Bergmann 2019-06-18 13:02:30 UTC
Piotr, the reported "Locale: en-US (C)" is odd; things typically work best when your system locale (i.e., the encoding used for naming files in the file system) and the locale used to run LO all use the UTF-8 encoding.

In a terminal shell, if you `cd` into the directory that contains łąka.png, and do `LC_ALL=C ls -b` there, what is the exact output you get for the łąka.png file?
Comment 5 Piotr 2019-06-18 16:04:59 UTC
(In reply to Stephan Bergmann from comment #4)
> Piotr, the reported "Locale: en-US (C)" is odd; things typically work best
> when your system locale (i.e., the encoding used for naming files in the
> file system) and the locale used to run LO all use the UTF-8 encoding.
> 
> In a terminal shell, if you `cd` into the directory that contains łąka.png,
> and do `LC_ALL=C ls -b` there, what is the exact output you get for the
> łąka.png file?

The exact output is '\305\202\304\205ka.png'
Comment 6 Stephan Bergmann 2019-06-18 17:00:27 UTC
(In reply to Piotr from comment #5)
> The exact output is '\305\202\304\205ka.png'

So the filename of łąka.png is encoded with UTF-8 in your file system.  Good.

But why then are you running LO with LANG=C?  Is that a deliberate decision, or an (unanticipated, by you) consequence of how LO is started in your desktop environment?  If the latter, what is the output of `locale` in a terminal shell, and do things start to work when you run LO from that terminal shell (normally, just typing `soffice` should start it; but make sure that any previously running instance of LO is closed)?
Comment 7 Jan-Marek Glogowski 2019-06-18 18:15:32 UTC
Just opening the file is a KDE only problem, that I can fix / work around. But then there will be much more problems. I've created bug 125971 for that.

I'll fix this eventually by using

OUString aNewURL = uri::ExternalUriReferenceTranslator::create(m_xContext)->translateToInternal(toOUString(aURL.toEncoded()));

but strictly speaking this should be somewhow handled correctly in INetURLObject.

Then the fix will just be to use aURL.toEncoded() instead of aURL.toString().
Comment 8 Stephan Bergmann 2019-06-18 20:44:25 UTC
(In reply to Jan-Marek Glogowski from comment #7)
> I'll fix this eventually by using
> 
> OUString aNewURL =
> uri::ExternalUriReferenceTranslator::create(m_xContext)-
> >translateToInternal(toOUString(aURL.toEncoded()));

...which, as I said on IRC, is wrong.  QUrl::toEncoded is documented (<https://doc.qt.io/qt-5/qurl.html#toEncoded> ) to generate a file URL whose "payload" is UTF-8--encoded.  But css.uri.XExternalUriReferenceTranslator::translateToInternal expects its argument to have a "payload" encoded according to the system locale (i.e., osl_getThreadTextEncoding).  As QUrl provides a file URL with "payload" in UTF-8, which is the same format as used internally in LO, there is no need to map here from external to internal URL.

Lets wait for Piotr to reply why he uses LO with LANG=C.  If you have files in your file system whose names are encoded with UTF-8, the only reliable way to access them from LO is to run LO with a UTF-8 system locale (like LANG=pl.UTF-8).
Comment 9 Piotr 2019-06-19 09:13:12 UTC
(In reply to Stephan Bergmann from comment #8)
> (In reply to Jan-Marek Glogowski from comment #7)
> > I'll fix this eventually by using
> > 
> > OUString aNewURL =
> > uri::ExternalUriReferenceTranslator::create(m_xContext)-
> > >translateToInternal(toOUString(aURL.toEncoded()));
> 
> ...which, as I said on IRC, is wrong.  QUrl::toEncoded is documented
> (<https://doc.qt.io/qt-5/qurl.html#toEncoded> ) to generate a file URL whose
> "payload" is UTF-8--encoded.  But
> css.uri.XExternalUriReferenceTranslator::translateToInternal expects its
> argument to have a "payload" encoded according to the system locale (i.e.,
> osl_getThreadTextEncoding).  As QUrl provides a file URL with "payload" in
> UTF-8, which is the same format as used internally in LO, there is no need
> to map here from external to internal URL.
> 
> Lets wait for Piotr to reply why he uses LO with LANG=C.  If you have files
> in your file system whose names are encoded with UTF-8, the only reliable
> way to access them from LO is to run LO with a UTF-8 system locale (like
> LANG=pl.UTF-8).

I've had LANG=C set in ~/.bashrc because it found it to be the reliable method to change display language to English, using Manjaro Settings Manager didn't do the trick for some reason. Now that I have removed the LANG=C the LO works as expected. Thank you for your help. By the way, with the help of ArchWiki I have found that while in /etc/locale.conf the LANG variable was en_US.UTF-8 all along, the LANGUAGE variable was set to pl_PL.UTF-8. I don't know what is the difference between them and why MSM could't change it for me, but upon changing and reboot my display language is English and LO still works as as expected.
Comment 10 Stephan Bergmann 2019-06-19 09:35:03 UTC
@Piotr:  Unlike as for LANG and LC_ALL etc. (see e.g. <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html>), there's no common interpretation of a LANGUAGE environment variable; LO doesn't use it at all.  (And no idea what "Manjaro Settings Manager" is.)

@jmux:  I'd be happy to close this as WONTFIX, given my explanations at bug 125995 comment 1, unless you want to keep it open for some reason.
Comment 11 Stephan Bergmann 2019-06-19 13:09:18 UTC
(In reply to Stephan Bergmann from comment #8)
> (In reply to Jan-Marek Glogowski from comment #7)
> > I'll fix this eventually by using
> > 
> > OUString aNewURL =
> > uri::ExternalUriReferenceTranslator::create(m_xContext)-
> > >translateToInternal(toOUString(aURL.toEncoded()));
> 
> ...which, as I said on IRC, is wrong.  QUrl::toEncoded is documented
> (<https://doc.qt.io/qt-5/qurl.html#toEncoded> ) to generate a file URL whose
> "payload" is UTF-8--encoded.  But
> css.uri.XExternalUriReferenceTranslator::translateToInternal expects its
> argument to have a "payload" encoded according to the system locale (i.e.,
> osl_getThreadTextEncoding).  As QUrl provides a file URL with "payload" in
> UTF-8, which is the same format as used internally in LO, there is no need
> to map here from external to internal URL.

...but see <https://gerrit.libreoffice.org/74359> "tdf#125971: map file URLs from QFileDialog to LO internal format" :)
Comment 12 Mike Kaganski 2019-06-19 15:02:16 UTC
(In reply to Jan-Marek Glogowski from comment #7)
> Just opening the file is a KDE only problem, that I can fix / work around.
> But then there will be much more problems. I've created bug 125971 for that.

The bug # looks wrong (that's this bug) ;-)
Comment 13 Stephan Bergmann 2019-06-19 16:04:18 UTC
(In reply to Mike Kaganski from comment #12)
> (In reply to Jan-Marek Glogowski from comment #7)
> > Just opening the file is a KDE only problem, that I can fix / work around.
> > But then there will be much more problems. I've created bug 125971 for that.
> 
> The bug # looks wrong (that's this bug) ;-)

I think he means bug 125995 (which he added to the "See Also" section)
Comment 14 Commit Notification 2019-06-19 16:19:03 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/e2589f4584efcf0306ab69f7223abdd7469e3604%5E%21

tdf#125971: map file URLs from QFileDialog to LO internal format

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Jan-Marek Glogowski 2019-06-19 16:28:01 UTC
Now there is already this patch - hmm. I wanted to go with the toEncoded() only variant until we know how / if we can handle bug 125995 at all.

At least gtk3 and kde5 are now broken in the same way with LANG=C.

Nobody mentioned yet, that a probably better "fix" for this is simply to set the locale C.UTF-8, which should result in the behavior Piotr expected to the beginning.
Comment 16 Stephan Bergmann 2019-06-19 16:50:51 UTC
(In reply to Jan-Marek Glogowski from comment #15)
> At least gtk3 and kde5 are now broken in the same way with LANG=C.

I'm not aware there's anything broken there.  (Nothing broken anymore for the kde5 case with the fix from comment 14.)

> Nobody mentioned yet, that a probably better "fix" for this is simply to set
> the locale C.UTF-8, which should result in the behavior Piotr expected to
> the beginning.

People are of course free to set LC_CTYPE=C.UTF-8 if their system supports it (e.g. upstream glibc still doesn't support it, see <https://sourceware.org/glibc/wiki/Proposals/C.UTF-8>).  That's orthogonal to the fix for non--UTF-8 locales from comment 14.

(I had suggested to close this bug as WONTFIX in comment 10 before realizing that all it takes to fix this for non--UTF-8 locales---and while keeping behavior the same for UTF-8 locales---is that (non-obvious) translateToInternal call (which you had already suggested in comment 7 and which I had dismissed on erroneous, as it later turned out, grounds in comment 8.)
Comment 17 Commit Notification 2019-07-14 18:45:00 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "libreoffice-6-3":

https://git.libreoffice.org/core/+/4cbcbb0a45b243971eb2e1da88b28bc03829a18e%5E%21

tdf#125971: map file URLs from QFileDialog to LO internal format

It will be available in 6.3.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.