Bug 64603 - Saving plain text files with Cyrillic content are wrongly encoded in ANSI charset (Windows)
Summary: Saving plain text files with Cyrillic content are wrongly encoded in ANSI cha...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.4.3 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Save-Text
  Show dependency treegraph
 
Reported: 2013-05-14 20:52 UTC by sogartary
Modified: 2021-05-13 09:18 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sogartary 2013-05-14 20:52:16 UTC
I wrote some text in Writer in Cyrillic and saved it in "Text (.txt)", not "Text Encoded (.txt)". After opening the file again in Writer it displayed only question marks. I opened the file in a hex editor and indeed on the place of the Cyrillic letters, there were only question marks in ANSI charset encoding.
It seams that Writer has totally disregarded the presence of non ANSI characters.
What should happen in that case, is to save it in some format that can represent them, like utf8.
Comment 1 Julien Nabet 2013-05-17 22:23:52 UTC Comment hidden (obsolete)
Comment 2 sogartary 2013-05-23 05:04:08 UTC Comment hidden (obsolete)
Comment 3 Julien Nabet 2013-05-28 17:35:50 UTC
sogartary: could you rename your LO directory profile (see https://wiki.documentfoundation.org/UserProfile) and give it a new try?
The goal is to be sure it's not due to customization or something.
Comment 4 sogartary 2013-06-03 17:03:20 UTC
I have tried what you said. Renamed the %appdata%\LibreOffice\4\user directory and retried again. The bug still persists.
Comment 5 sogartary 2013-06-04 09:15:58 UTC
I also tried to reproduce the bug on Ubuntu 12.04, agian with LibreOffice 4.0.3.3. It seams that there everything is Ok.
Comment 6 Julien Nabet 2013-06-04 20:22:05 UTC
sogartary: when comments 2, 4 and 5 read together, I don't understand.
Is it ok or not with 4.0.3 and a brand new LO profile?
If it's not the case, which case is ok?
Comment 7 QA Administrators 2014-02-02 02:06:59 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2014-02-26 19:31:28 UTC Comment hidden (obsolete)
Comment 9 Urmas 2014-12-03 17:04:18 UTC
Even Notepad warns about data loss when the file contains Unicode characters.
Comment 10 Buovjaga 2021-05-07 15:15:07 UTC
This is Windows-only. If you don't use the file format "Text - Choose Encoding", the file will have question marks in place of non-ASCII characters.

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 9df3aa7ea72d61462e430643f2a80906dce4e15b
CPU threads: 2; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: fi-FI (fi_FI); UI: en-US
Calc: threaded Jumbo
Comment 11 Mike Kaganski 2021-05-13 07:49:47 UTC
(In reply to Urmas from comment #9)
> Even Notepad warns about data loss when the file contains Unicode characters.

As well as LibreOffice warns when anyone saves anything to TXT, because there will be inevitable loss - of formatting; of graphics; of metadata; of information in headers/footers. No need in additional warnings about "also, not all characters present in document *body* are representable in selected encoding's charset".

Of course this is not Windows-only; it would appear on any platform with system encoding being non-Unicode. However, on other platforms it's *usual* to use UTF-8. But there exist Linux systems using e.g. KOI-8R, etc.

This is NOTABUG.
Comment 12 Mike Kaganski 2021-05-13 09:18:22 UTC
On the other hand, why not change our "simple text" export to use UTF-8 (with BOM) instead of "system encoding" (unless there's existing encoding information from import - see tdf#120574, which is a different problem)? That would avoid this situation; UTF-8 is universal standard now, with much better chances to be correctly read than any other encoding. So should this become "Use UTF-8 instead of system encoding in text export filter by default"?