Bug 120574 - TXT file encoding is lost when saving
Summary: TXT file encoding is lost when saving
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: difficultyMedium, easyHack, skillCpp
: 132426 (view as bug list)
Depends on:
Blocks: Save-Text
  Show dependency treegraph
Reported: 2018-10-13 20:41 UTC by Mike Kaganski
Modified: 2021-02-19 04:18 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2018-10-13 20:41:39 UTC
When a plain text file is opened (no matter if "Text" or "Text - choose encoding" was used), the filter settings (charset, Line End, is BOM present: see SwAsciiOptions) are forgot, and last used for export are shown if "Edit filter settings" is requested. In case, say, I open a UTF-8 with BOM, I'll get a Windows-1251 Cyrillic file upon save.

The settings should persist with the open file, to not require user to re-enter them on save (a user might have no idea what they were if auto-detection was used).
Comment 1 m.a.riosv 2018-10-13 22:53:30 UTC
You are right the same type should be used for save.
Comment 2 librebug 2018-10-14 08:16:49 UTC
And by default for "save as". But here the user should have the option to change settings.

As has been remarked elsewhere -- for example 82254, "FILESAVE: UTF-8 BOM removed from CSV when saving file" -- this is how LO should act for all documents it can read and save in plain text form.  For predictability, consistency, accuracy, and usability.
Comment 3 Ming Hua 2020-05-02 07:40:30 UTC
*** Bug 132426 has been marked as a duplicate of this bug. ***
Comment 4 Mike Kaganski 2021-01-22 08:17:59 UTC
The filter should store the settings in medium attached to the document. The proper place to store this is likely AsciiReader::Read (sw/source/filter/ascii/parasc.cxx), which has m_pMedium set in SwReader::Read, and allows to call its GetItemSet()->Put() to modify the data based on parser data.

Using this data likely should happen in SwASCWriter::SetupFilterOptions (which should be implemented, and which is called from Writer::Write, where the medium is available).

The easy hack implies that a unit test is also implemented, that tests that the detected non-default settings (e.g., UTF-16BE with BOM with CR line endings, etc.) are retained on save-and-reload. The unit test should be in sw/qa/extras/txtexport/txtexport.cxx, and should include reading the exported file (similar to what TxtExportTest::readExportedFile does), testing BOM and data bytes.