Bug 96902 - LibreOffice assumes all non-7-bit characters in filenames are valid UTF-8 characters (see comment 21)
Summary: LibreOffice assumes all non-7-bit characters in filenames are valid UTF-8 cha...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.3.7.2 release
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: File-Name
  Show dependency treegraph
 
Reported: 2016-01-05 09:04 UTC by HPS
Modified: 2023-10-05 05:19 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of open failure (60.21 KB, image/png)
2016-01-06 18:53 UTC, HPS
Details
Test file (empty) (16 bytes, application/pdf)
2016-01-07 12:41 UTC, HPS
Details
Test file (tar format) (528.50 KB, application/octet-stream)
2016-01-08 14:00 UTC, HPS
Details
UFS binary image (256.00 KB, application/octet-stream)
2016-01-08 16:43 UTC, HPS
Details

Note You need to log in before you can comment on or make changes to this bug.
Description HPS 2016-01-05 09:04:21 UTC
Hi,

When opening or downloading documents via Firefox which contain Norwegian characters, Libreoffice cannot find the file to open. Using other software this works fine. I can test a patch. I'm not sure where the problem is, though I suspect that Libreoffice is somehow filtering the filenames.

--HPS
Comment 1 HPS 2016-01-05 09:05:14 UTC
To clarify: Norwegian characters in the filename, not inside the document.
Comment 2 Buovjaga 2016-01-06 18:32:49 UTC
Could you test with LibreOffice 5 to verify the problem still exists?
Comment 3 HPS 2016-01-06 18:52:56 UTC
I tried libreoffice-5.0.4 and it fails too. See attached screenshot.
Comment 4 HPS 2016-01-06 18:53:35 UTC
Created attachment 121753 [details]
Screenshot of open failure
Comment 5 HPS 2016-01-06 19:03:42 UTC
It looks like libreoffice converts the actually Norwegian standard ASCII character to a question mark. It is not UTF-8 like I thought initially. You should be able to reproduce this, by creating a similarly named file. See the hexdump in the screenshot to the right.

It looks like the filename is not preserved like it should and that libreoffice tries to convert or format it somehow.
Comment 6 Buovjaga 2016-01-07 09:55:30 UTC
I found an existing file with bølgen in it: http://www.oppmerksombevegelse.no/wp-content/uploads/2014/07/Skjelve-b%C3%B8lgen-VG+-fra-8.-September-2012-copy.pdf

I could open it fine from Firefox, selected LibreOffice as the program to open.
Can you open that ok?

If you can open it too, maybe you could attach a problematic example file.

Ubuntu 15.10 64-bit 
Version: 5.0.3.2
Build ID: 1:5.0.3~rc2-0ubuntu1
Locale: en-US (en_US.UTF-8)

Version: 5.2.0.0.alpha0+
Build ID: 4a8c0d313540bd78c9c381edd548b976c580ca9a
CPU Threads: 2; OS Version: Linux 4.2; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF-dbg, Branch:master, Time: 2016-01-04_02:01:32
Locale: en-US (en_US.UTF-8)
Comment 7 HPS 2016-01-07 11:32:13 UTC
Hi,

It doesn't work:

libreoffice Skjelve-b\370lgen-VG+-fra-8.-September-2012-copy.pdf

Libreoffice is not able to find the file.

Can you open a terminal an "ls -l | hexdump -C" the file to see what characters it is stored with in the file-system?

--HPS
Comment 8 Buovjaga 2016-01-07 11:42:42 UTC
00000000  74 6f 74 61 6c 20 31 36  38 38 0a 2d 72 77 2d 72  |total 1688.-rw-r|
00000010  77 2d 72 2d 2d 20 31 20  74 65 73 74 69 20 74 65  |w-r-- 1 testi te|
00000020  73 74 69 20 31 37 32 36  34 30 32 20 74 61 6d 6d  |sti 1726402 tamm|
00000030  69 20 20 37 20 31 31 3a  34 32 20 53 6b 6a 65 6c  |i  7 11:42 Skjel|
00000040  76 65 2d 62 c3 b8 6c 67  65 6e 2d 56 47 2b 2d 66  |ve-b..lgen-VG+-f|
00000050  72 61 2d 38 2e 2d 53 65  70 74 65 6d 62 65 72 2d  |ra-8.-September-|
00000060  32 30 31 32 2d 63 6f 70  79 2e 70 64 66 0a        |2012-copy.pdf.|
0000006e
Comment 9 HPS 2016-01-07 12:10:58 UTC
Your file:

00000040  76 65 2d 62 c3 b8 6c 67  65 6e 2d 56 47 2b 2d 66  |ve-b..lgen-VG+-f|
                      ^^^^^ Valid UTF-8 code

Look here for 0xc3 0xb8:

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=128&names=-&utf8=0x

My file:

00000030  53 6b 6a 65 6c 76 65 2d  62 f8 6c 67 65 6e 2d 56  |Skjelve-b.lgen-V|
                                      ^^ Norwegian ASCII character for "ø"

So it looks like libreoffice tries to convert the filename into UTF-8 before opening the file, which in my case doesn't work, because that doesn't preserve the 0xf8 ASCII character.

-HPS
Comment 10 Buovjaga 2016-01-07 12:23:07 UTC
Plz attach file so I can confirm :)
Comment 11 HPS 2016-01-07 12:41:33 UTC
Created attachment 121774 [details]
Test file (empty)
Comment 12 HPS 2016-01-07 12:43:03 UTC
Hi,

You can possibly reproduce like this on a Linux box:

touch Skjelve-b\370lgen.pdf
libreoffice Skjelve-b\370lgen.pdf

I've uploaded empty test file.

--HPS
Comment 13 Buovjaga 2016-01-07 12:50:55 UTC
(In reply to HPS from comment #11)
> Created attachment 121774 [details]
> Test file (empty)

I can open it fine. Same for a file created with touch Skjelve-b\370lgen.pdf
It seems I would have to test in FreeBSD. Maybe I will install it in a VM :)

Ubuntu 15.10 64-bit 
Version: 5.0.3.2
Build ID: 1:5.0.3~rc2-0ubuntu1
Locale: en-US (en_US.UTF-8)

Version: 5.2.0.0.alpha0+
Build ID: 4a8c0d313540bd78c9c381edd548b976c580ca9a
CPU Threads: 2; OS Version: Linux 4.2; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF-dbg, Branch:master, Time: 2016-01-04_02:01:32
Locale: en-US (en_US.UTF-8)
Comment 14 HPS 2016-01-07 13:04:30 UTC
Hi,

You can get a FreeBSD .iso from here:

http://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/11.0/FreeBSD-11.0-CURRENT-amd64-20160106-r293245-disc1.iso

The easiest way to try this out is install libreoffice via pre-compiled pkg:

pkg install xauth libreoffice

Then "ssh -XY " to the image and run libreoffice via SSH X11 redirection.

Should take less than 30 minutes.
Comment 15 Buovjaga 2016-01-08 10:54:42 UTC
(In reply to HPS from comment #11)
> Created attachment 121774 [details]
> Test file (empty)

I can open this fine on FreeBSD 11.0 current, from Firefox to LibreOffice 5.0.4.
Comment 16 Buovjaga 2016-01-08 11:12:26 UTC
Hexdump from FreeBSD:

00000000  74 6f 74 61 6c 20 34 0a  2d 72 77 2d 72 2d 2d 72  |total 4.-rw-r--r|
00000010  2d 2d 20 20 31 20 72 6f  6f 74 20 20 77 68 65 65  |--  1 root  whee|
00000020  6c 20 20 31 36 20 4a 61  6e 20 20 38 20 31 31 3a  |l  16 Jan  8 11:|
00000030  31 30 20 53 6b 6a 65 6c  76 65 2d 62 c3 b8 6c 67  |10 Skjelve-b..lg|
00000040  65 6e 2d 56 47 2b 2d 66  72 61 2d 38 2e 2d 53 65  |en-VG+-fra-8.-Se|
00000050  70 74 65 6d 62 65 72 2d  32 30 31 32 2d 63 6f 70  |ptember-2012-cop|
00000060  79 2e 70 64 66 0a                                 |y.pdf.|
00000066
Comment 17 Buovjaga 2016-01-08 11:16:02 UTC
Btw. what filesystem are you using? I used the 11.0 current .vhd.
Comment 18 HPS 2016-01-08 14:00:18 UTC
Hi,

I'm using UFS. And I've selected Norwegian language and keyboard layout.

I've uploaded test.tar which should preserve the character.

Can you try to "tar -xvf test.tar" and then try to open the empty file?

In all your tests you end up with UTF-8.

BTW: I'm using FreeBSD-9, not sure if it is related.

--HPS
Comment 19 HPS 2016-01-08 14:00:43 UTC
Created attachment 121793 [details]
Test file (tar format)
Comment 20 Buovjaga 2016-01-08 14:51:00 UTC
(In reply to HPS from comment #19)
> Created attachment 121793 [details]
> Test file (tar format)

Untarred it and opened with LibreOffice just fine.
I checked and I am also using UFS.

Maybe you could also try 11.0 in a VM :)

Here is the hexdump:

00000000  74 6f 74 61 6c 20 35 37  36 0a 2d 72 77 2d 72 2d  |total 576.-rw-r-|
00000010  2d 72 2d 2d 20 20 31 20  31 30 30 35 20 20 77 68  |-r--  1 1005  wh|
00000020  65 65 6c 20 20 35 33 38  32 30 32 20 41 75 67 20  |eel  538202 Aug |
00000030  32 39 20 31 30 3a 33 33  20 42 c3 b8 6c 67 65 6e  |29 10:33 B..lgen|
00000040  2e 70 64 66 0a                                    |.pdf.|
00000045
Comment 21 HPS 2016-01-08 16:42:24 UTC
Hi,

I have some progress:

By setting the enviroment "LANG=nb_NO.UTF-8" UTF-8 characters are used by both Firefox and Libreoffice and Libreoffice sees the file. If I don't then it doesn't work.

Final test:

I've attached a binary UFS image. You can mount it like this:

mdconfig -a -t vnode -f image.bin

mount -t ufs /dev/md0 /mnt

cd /mnt

ls -l | hexdump -C
00000000  74 6f 74 61 6c 20 34 0a  64 72 77 78 72 77 78 72  |total 4.drwxrwxr|
00000010  2d 78 20 20 32 20 72 6f  6f 74 20 20 6f 70 65 72  |-x  2 root  oper|
00000020  61 74 6f 72 20 20 35 31  32 20 4a 61 6e 20 20 38  |ator  512 Jan  8|
00000030  20 31 36 3a 34 36 20 2e  73 6e 61 70 2f 0a 2d 72  | 16:46 .snap/.-r|
00000040  77 2d 72 2d 2d 72 2d 2d  20 20 31 20 72 6f 6f 74  |w-r--r--  1 root|
00000050  20 20 77 68 65 65 6c 20  20 20 20 20 20 20 30 20  |  wheel       0 |
00000060  4a 61 6e 20 20 38 20 31  36 3a 34 36 20 42 f8 6c  |Jan  8 16:46 B.l|
00000070  67 65 6e 2e 70 64 66 0a                           |gen.pdf.|
00000078

Try to open the file Bølgen.pdf there. Thank you!

--HPS
Comment 22 HPS 2016-01-08 16:43:12 UTC
Created attachment 121800 [details]
UFS binary image
Comment 23 Buovjaga 2016-01-08 17:02:25 UTC
Ha! Now I could reproduce :)

What a convoluted repro path, hehe..

Setting to NEW.
Comment 24 QA Administrators 2017-03-06 15:37:46 UTC Comment hidden (obsolete)
Comment 25 QA Administrators 2019-12-03 14:54:39 UTC Comment hidden (obsolete)
Comment 26 HPS 2019-12-03 15:05:48 UTC
Issue is not solved in latest LibreOffice available to me.
Comment 27 QA Administrators 2021-12-03 04:43:10 UTC
Dear HPS,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug