Bug 63673 - FILEOPEN: Unicode text encodings not auto-recognised in LO4.0.2 Linux (OK in Windows version?)
Summary: FILEOPEN: Unicode text encodings not auto-recognised in LO4.0.2 Linux (OK in ...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0.2.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Maxim Monastirsky
URL:
Whiteboard: BSA target:5.4.0
Keywords:
: 112069 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-04-18 08:10 UTC by Chris Billington
Modified: 2018-11-13 19:51 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample UTF_16le Chinese text file (57.73 KB, text/plain)
2013-04-18 08:10 UTC, Chris Billington
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Billington 2013-04-18 08:10:59 UTC
Created attachment 78171 [details]
Sample UTF_16le Chinese text file

Problem description:
When opening a Unicode text file encoded UTF-16LE, the character encoding denoted by the 'Byte Order Mark' does not seem to be recognised correctly on LO Linux. However, on the same version on Windows Platform, encoding is recognised. 

Steps to reproduce:
1. Open attached sample UTF-16le text file using Writer in LO4.0.2 Linux. Use the default 'all files' filter. The file encoding is not correctly recognised (garbage characters shown).
2. Open attached sample text file using Writer, but preselect the 'Text Encoded' filter first. Select 'Unicode' as encoding and 'CR+LF' as line separator. Characters shown correctly.
3. If opening the file in Calc, the encoding is correctly detected as Unicode.
4. If opening the file in Writer 4.0.2 on Windows, encoding is correctly detected and file displays correctly.

The first two bytes of the attached file are the UTF-16le 'byte order mark' <FF> <FE>. But it seems LO4.0.2 Writer on Linux doesn't recognise these automatically.

'Language Settings' in preferences don't seem to make any difference.




              
Operating System: Linux (Other)
Version: 4.0.2.2 release
Comment 1 Owen Genat (retired) 2013-04-19 10:07:38 UTC
I can confirm this using Version 4.0.2.2 (Build ID: 4c82dcdd6efcd48b1d8bba66bfe1989deee49c3) under both Windows 7 Home Premium and Ubuntu 10.04 x86_64. Behaviour is as described. Setting status to NEW.
Comment 2 QA Administrators 2015-03-04 02:23:02 UTC Comment hidden (obsolete)
Comment 3 Chris Billington 2015-03-04 12:36:28 UTC
The issue still exists in LO 4.4.1.2 (4.4.1.2 Arch Linux build-1) en_GB locale on Linux.

The 'select encoding' dialog now allows a choice of UTF-7, UTF-8, UTF-16 (UTF-16 works, with a Chinese-capable font selected).

LO 4.4.1.2 on Windows still opens the file correctly using the 'all files' file dialog.

Chris
Comment 4 Maxim Monastirsky 2015-08-04 22:30:30 UTC
taking.
Comment 5 Volga 2017-01-19 04:18:33 UTC
LibreOffice should allowed to auto-recognise character encoding by checking the BOM. Some informations for BOM here:
https://en.wikipedia.org/wiki/Byte_order_mark
Comment 6 Volga 2017-01-19 04:47:06 UTC
Additionaly, if a TXT file does not have BOM, then LibreOffice should provide an interface to let user choose a proper encoding to view, this interface should also include a preview pane.
Comment 7 Commit Notification 2017-01-23 11:04:53 UTC
Maxim Monastirsky committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6ff57262f44843ccd1f320426984b5e074e3eaf1

tdf#63673 Never ignore detected BOM

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 QA Administrators 2017-03-01 10:53:09 UTC
Hello Maxim
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 9 Andras Timar 2017-08-28 16:20:36 UTC
*** Bug 112069 has been marked as a duplicate of this bug. ***
Comment 10 Volga 2017-08-29 13:53:03 UTC
Yes, this is fixed, verified in 5.4.1.

版本:5.4.1.2 (x64)
Build ID:ea7cb86e6eeb2bf3a5af73a8f7777ac570321527
CPU 线程:4; 操作系统:Windows 6.19; UI 渲染:默认; 
区域语言:zh-CN (zh_CN); Calc: group