Bug 150574 - Writer adds paragraph breaks approximately every 9999 characters to a UTF16 xml file
Summary: Writer adds paragraph breaks approximately every 9999 characters to a UTF16 x...
Status: RESOLVED DUPLICATE of bug 70423
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.3.5.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-24 04:29 UTC by MikeG
Modified: 2022-08-24 05:05 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Sample Honeywell Control Module xml file. (107.24 KB, application/xml)
2022-08-24 04:33 UTC, MikeG
Details

Note You need to log in before you can comment on or make changes to this bug.
Description MikeG 2022-08-24 04:29:16 UTC
Description:
I am working on a C++ program to recreate application content of a Honeywell Control Module xml file.  I want to use writer and calc to help me see what I am missing in part of my algorithm, and one of the methods I am using is to parse the xml file by replacing all of the < delimiters with /n> so that I can then paste this into calc and take a closer look at the section where my algo is failing.

Unfortunately, writer corrupts the file by placing what I assume are cr/lf chars into the file at the end of five sections: 9999, 9954, 9999, 9999 and 9987 character counts.

Steps to Reproduce:
Open or paste the contents of the xml sample file I'll attach to this report.

Actual Results:
I will attach a sample of the xml file to this bug report.  There is only one cr/lf pair in the original file.  When opened or pasted into Writer there will be 5 cr/lf breaks not present in the original file.

Expected Results:
Plain text file integrity seemingly ought to be preserved, if it's not then this seemingly presents potential errors that will likely manifest in other bugs.


Reproducible: Always


User Profile Reset: No



Additional Info:
Preserve plain text file integrity.
Comment 1 MikeG 2022-08-24 04:33:14 UTC
Created attachment 181992 [details]
Sample Honeywell Control Module xml file.

UTF-16LE format.
Comment 2 Mike Kaganski 2022-08-24 05:05:02 UTC
It is being worked on in https://gerrit.libreoffice.org/c/core/+/121548.

*** This bug has been marked as a duplicate of bug 70423 ***