Created attachment 78171 [details]
Sample UTF_16le Chinese text file
When opening a Unicode text file encoded UTF-16LE, the character encoding denoted by the 'Byte Order Mark' does not seem to be recognised correctly on LO Linux. However, on the same version on Windows Platform, encoding is recognised.
Steps to reproduce:
1. Open attached sample UTF-16le text file using Writer in LO4.0.2 Linux. Use the default 'all files' filter. The file encoding is not correctly recognised (garbage characters shown).
2. Open attached sample text file using Writer, but preselect the 'Text Encoded' filter first. Select 'Unicode' as encoding and 'CR+LF' as line separator. Characters shown correctly.
3. If opening the file in Calc, the encoding is correctly detected as Unicode.
4. If opening the file in Writer 4.0.2 on Windows, encoding is correctly detected and file displays correctly.
The first two bytes of the attached file are the UTF-16le 'byte order mark' <FF> <FE>. But it seems LO4.0.2 Writer on Linux doesn't recognise these automatically.
'Language Settings' in preferences don't seem to make any difference.
Operating System: Linux (Other)
Version: 220.127.116.11 release
I can confirm this using Version 18.104.22.168 (Build ID: 4c82dcdd6efcd48b1d8bba66bfe1989deee49c3) under both Windows 7 Home Premium and Ubuntu 10.04 x86_64. Behaviour is as described. Setting status to NEW.
** Please read this message in its entirety before responding **
To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.
There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.
If you have time, please do the following:
Test to see if the bug is still present on a currently supported version of LibreOffice (22.214.171.124 or later): https://www.libreoffice.org/download/
If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System
Please DO NOT
Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case)
Thank you for your help!
-- The LibreOffice QA Team
This NEW Message was generated on: 2015-03-03
The issue still exists in LO 126.96.36.199 (188.8.131.52 Arch Linux build-1) en_GB locale on Linux.
The 'select encoding' dialog now allows a choice of UTF-7, UTF-8, UTF-16 (UTF-16 works, with a Chinese-capable font selected).
LO 184.108.40.206 on Windows still opens the file correctly using the 'all files' file dialog.
LibreOffice should allowed to auto-recognise character encoding by checking the BOM. Some informations for BOM here:
Additionaly, if a TXT file does not have BOM, then LibreOffice should provide an interface to let user choose a proper encoding to view, this interface should also include a preview pane.
Maxim Monastirsky committed a patch related to this issue.
It has been pushed to "master":
tdf#63673 Never ignore detected BOM
It will be available in 5.4.0.
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
*** Bug 112069 has been marked as a duplicate of this bug. ***
Yes, this is fixed, verified in 5.4.1.
CPU 线程：4; 操作系统：Windows 6.19; UI 渲染：默认;
区域语言：zh-CN (zh_CN); Calc: group