Bug 150574

Summary: Writer adds paragraph breaks approximately every 9999 characters to a UTF16 xml file
Product: LibreOffice Reporter: MikeG <mike>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED DUPLICATE    
Severity: normal CC: mike
Priority: medium    
Version: 7.3.5.2 release   
Hardware: All   
OS: Windows (All)   
Whiteboard:
Crash report or crash signature: Regression By:
Attachments: Sample Honeywell Control Module xml file.

Description MikeG 2022-08-24 04:29:16 UTC
Description:
I am working on a C++ program to recreate application content of a Honeywell Control Module xml file.  I want to use writer and calc to help me see what I am missing in part of my algorithm, and one of the methods I am using is to parse the xml file by replacing all of the < delimiters with /n> so that I can then paste this into calc and take a closer look at the section where my algo is failing.

Unfortunately, writer corrupts the file by placing what I assume are cr/lf chars into the file at the end of five sections: 9999, 9954, 9999, 9999 and 9987 character counts.

Steps to Reproduce:
Open or paste the contents of the xml sample file I'll attach to this report.

Actual Results:
I will attach a sample of the xml file to this bug report.  There is only one cr/lf pair in the original file.  When opened or pasted into Writer there will be 5 cr/lf breaks not present in the original file.

Expected Results:
Plain text file integrity seemingly ought to be preserved, if it's not then this seemingly presents potential errors that will likely manifest in other bugs.


Reproducible: Always


User Profile Reset: No



Additional Info:
Preserve plain text file integrity.
Comment 1 MikeG 2022-08-24 04:33:14 UTC
Created attachment 181992 [details]
Sample Honeywell Control Module xml file.

UTF-16LE format.
Comment 2 Mike Kaganski 2022-08-24 05:05:02 UTC
It is being worked on in https://gerrit.libreoffice.org/c/core/+/121548.

*** This bug has been marked as a duplicate of bug 70423 ***