Bug 63673 - FILEOPEN: Unicode text encodings not auto-recognised in LO4.0.2 Linux (OK in Windows version?)
Summary: FILEOPEN: Unicode text encodings not auto-recognised in LO4.0.2 Linux (OK in ...
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Maxim Monastirsky
QA Contact:
Whiteboard: BSA target:5.4.0
: 112069 (view as bug list)
Depends on:
Reported: 2013-04-18 08:10 UTC by Chris Billington
Modified: 2017-08-29 13:53 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:

Sample UTF_16le Chinese text file (57.73 KB, text/plain)
2013-04-18 08:10 UTC, Chris Billington

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Billington 2013-04-18 08:10:59 UTC
Created attachment 78171 [details]
Sample UTF_16le Chinese text file

Problem description:
When opening a Unicode text file encoded UTF-16LE, the character encoding denoted by the 'Byte Order Mark' does not seem to be recognised correctly on LO Linux. However, on the same version on Windows Platform, encoding is recognised. 

Steps to reproduce:
1. Open attached sample UTF-16le text file using Writer in LO4.0.2 Linux. Use the default 'all files' filter. The file encoding is not correctly recognised (garbage characters shown).
2. Open attached sample text file using Writer, but preselect the 'Text Encoded' filter first. Select 'Unicode' as encoding and 'CR+LF' as line separator. Characters shown correctly.
3. If opening the file in Calc, the encoding is correctly detected as Unicode.
4. If opening the file in Writer 4.0.2 on Windows, encoding is correctly detected and file displays correctly.

The first two bytes of the attached file are the UTF-16le 'byte order mark' <FF> <FE>. But it seems LO4.0.2 Writer on Linux doesn't recognise these automatically.

'Language Settings' in preferences don't seem to make any difference.

Operating System: Linux (Other)
Version: release
Comment 1 Owen Genat (retired) 2013-04-19 10:07:38 UTC
I can confirm this using Version (Build ID: 4c82dcdd6efcd48b1d8bba66bfe1989deee49c3) under both Windows 7 Home Premium and Ubuntu 10.04 x86_64. Behaviour is as described. Setting status to NEW.
Comment 2 QA Administrators 2015-03-04 02:23:02 UTC
** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present on a currently supported version of LibreOffice ( or later): https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case)

Thank you for your help!

-- The LibreOffice QA Team
This NEW Message was generated on: 2015-03-03
Comment 3 Chris Billington 2015-03-04 12:36:28 UTC
The issue still exists in LO ( Arch Linux build-1) en_GB locale on Linux.

The 'select encoding' dialog now allows a choice of UTF-7, UTF-8, UTF-16 (UTF-16 works, with a Chinese-capable font selected).

LO on Windows still opens the file correctly using the 'all files' file dialog.

Comment 4 Maxim Monastirsky 2015-08-04 22:30:30 UTC
Comment 5 Volga 2017-01-19 04:18:33 UTC
LibreOffice should allowed to auto-recognise character encoding by checking the BOM. Some informations for BOM here:
Comment 6 Volga 2017-01-19 04:47:06 UTC
Additionaly, if a TXT file does not have BOM, then LibreOffice should provide an interface to let user choose a proper encoding to view, this interface should also include a preview pane.
Comment 7 Commit Notification 2017-01-23 11:04:53 UTC
Maxim Monastirsky committed a patch related to this issue.
It has been pushed to "master":


tdf#63673 Never ignore detected BOM

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:

Affected users are encouraged to test the fix and report feedback.
Comment 8 QA Administrators 2017-03-01 10:53:09 UTC
Hello Maxim
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 9 Andras Timar 2017-08-28 16:20:36 UTC
*** Bug 112069 has been marked as a duplicate of this bug. ***
Comment 10 Volga 2017-08-29 13:53:03 UTC
Yes, this is fixed, verified in 5.4.1.

版本: (x64)
Build ID:ea7cb86e6eeb2bf3a5af73a8f7777ac570321527
CPU 线程:4; 操作系统:Windows 6.19; UI 渲染:默认; 
区域语言:zh-CN (zh_CN); Calc: group