Bug 105408 - LibreOffice Writer have no way to select a text encoding when open a TXT file
Summary: LibreOffice Writer have no way to select a text encoding when open a TXT file
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.3.0.1 rc
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-18 10:05 UTC by Volga
Modified: 2017-01-18 23:38 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Test file (96.50 KB, application/zip)
2017-01-18 15:22 UTC, Volga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volga 2017-01-18 10:05:37 UTC
Description:
When I open a TXT file with LibreOffice Writer, several texts become mojibake, and LibreOffice Writer have no way to select a text encoding.

Some discussions at LibreOffice Chinese Community:

我在 Unicode 网站的 UDHR in Unicode 专栏下载 TXT 文件以后发现用 LibreOffice 打开以后不知道在哪选择文字编码,怎么办?
I download TXT file from UDHR in Unicode column in Unicode, then found (I) don’t know where to select a text encoding. How to do?

只能用文本编辑器打开然后粘贴到Writer里。如果是Calc的话,选择性粘贴是会让选择编码。
Only open with text editor and then paste into Writer. If into Calc, selective paste could select the encoding.

Source: http://www.libreofficechina.org/thread-1803-1-1.html

Steps to Reproduce:
1. Download any TXT file from: http://www.unicode.org/udhr/aggregates.html
2. Open with LibreOffice Writer

Actual Results:  
Doing above operations I will see some non Latin texts become mojibake. I have also open this file with BabelPad, it looks proper, then I insert a BOM (U+FEFF) at the beginning of text and save it, then the file still looks unhappy with LO Writer.

Expected Results:
When open a TXT file, LibreOffice should provide an interface to select a text encoding at startup, this interface should also include a preview pane.


Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 5.3.0.2 (x64)
Build ID: 5ad7b2889021c491af62f7930a4b1cb631392f16
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 布局引擎:新; 
Locale: zh-CN (zh_CN); Calc: group


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0
Comment 1 Maxim Monastirsky 2017-01-18 10:29:04 UTC
(In reply to Volga from comment #0)
> LibreOffice Writer have no way to select a text encoding.
There is a way. In the open file dialog scroll the types list, and choose "Text - Choose Encoding".

> I
> have also open this file with BabelPad, it looks proper, then I insert a BOM
> (U+FEFF) at the beginning of text and save it, then the file still looks
> unhappy with LO Writer.
Then maybe this file is of little-endian order, and should have U+FFFE instead? Can you please attach that file?
Comment 2 Volga 2017-01-18 15:22:36 UTC
Created attachment 130529 [details]
Test file
Comment 3 Maxim Monastirsky 2017-01-18 23:38:46 UTC
OK, there is some confusion here. U+FEFF is the UTF-16 BOM, but the attached file is UTF-8, and even has UTF-8 BOM.

And still, this file shows real bug. It's correctly detected as UTF-8 under Linux but not under Windows. Most likely the reason is that SwIoSystem::IsDetectableText returns false if the detected line end differs from the system one, which is the case here since the file has LF, but Windows default is CRLF. There is similar case in Bug 63673 but with UTF-16 with CRLF under Linux.

I'm closing this bug as WORKSFORME, since the original request was for a way to choose encoding, and there is already. And improving the auto detection is handled already in Bug 63673.