Bug 142669 - When opening UTF8 or UTF16 files BOM is always detected
Summary: When opening UTF8 or UTF16 files BOM is always detected
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium minor
Assignee: tobias
URL:
Whiteboard: target:7.2.0 target:7.3.0
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-05 09:59 UTC by tobias
Modified: 2021-06-21 17:05 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
UTF8 file without BOM (14 bytes, text/plain)
2021-06-05 10:02 UTC, tobias
Details
UTF8 file with BOM (19 bytes, text/plain)
2021-06-05 10:05 UTC, tobias
Details
UTF8 file without BOM (16 bytes, text/plain)
2021-06-05 10:06 UTC, tobias
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tobias 2021-06-05 09:59:56 UTC
Description:
When opening a UTF8 or UTF16 file without BOM, internally the encoding auto detection marks the BOM as present. When saving the file (without editing filter settings) the BOM is written.
To reproduce a version containing the fix for tdf#120574 should be used.

Steps to Reproduce:
1.Open UTF8 file without BOM
2.Save file
3.Investigate saved file

Actual Results:
The saved files encoding is UTF8 with BOM

Expected Results:
The saved files encoding is UTF8 without BOM


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.0.0.alpha1+ / LibreOffice Community
Build ID: cb490979ac238011efa27e0fb18fe62c13329d1f
CPU threads: 12; OS: Linux 5.12; UI render: default; VCL: x11
Locale: de-DE (de_DE.UTF-8); UI: en-US
Calc: threaded
Comment 1 tobias 2021-06-05 10:02:16 UTC
Created attachment 172635 [details]
UTF8 file without BOM
Comment 2 tobias 2021-06-05 10:05:18 UTC
Created attachment 172636 [details]
UTF8 file with BOM
Comment 3 tobias 2021-06-05 10:06:13 UTC
Created attachment 172637 [details]
UTF8 file without BOM
Comment 4 Michael Warner 2021-06-05 13:33:32 UTC
I'm able to reproduce in:
Version: 7.2.0.0.alpha1+ / LibreOffice Community
Build ID: 452bf1359dab3cfab9fd6007d68592e9c96382b3
CPU threads: 12; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-04_18:12:48
Calc: threaded
Comment 5 Commit Notification 2021-06-06 16:53:12 UTC
tobias committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/162f5a20095c6937030d23ee03fb8f72c51eefa1

tdf#142669 Consider BOM on text encoding detection

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2021-06-14 09:24:19 UTC
tobias committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/62503c3acde8eab0ca198f66519270761f64d56d

tdf#142669 Assert data alignment in unit test

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.