Bug 129790 - LibreOffice Writer saves RTF documents with enormous file size (1000 times bigger)
Summary: LibreOffice Writer saves RTF documents with enormous file size (1000 times bi...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.3.3.2 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:rtf
Depends on:
Blocks: RTF
  Show dependency treegraph
 
Reported: 2020-01-04 16:41 UTC by Sebastien
Modified: 2022-10-02 18:05 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Test file for issue - LibreOffice Writer saves RTF documents with enormous file size (1000 times bigger) (1.44 MB, application/rtf)
2020-01-06 14:42 UTC, Sebastien
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastien 2020-01-04 16:41:03 UTC
Description:
During many years, I have worked with RTF documents in Writer. I have had many documents that I continuously updated. Not all the time, but regularly, the size of the documents just got bigger and bigger. The size has become out of scope.

For example, my latest problem this morning is as described below. I had an RTF document under 10 MB. It took a while to open (like a minute or two). After opening the document, I entered one extra line and I saved it. The new size of the document was 12.1 MB. I tried to open the document again and it took several minutes to open. Once open, I tried to save again by adding a few extra lines. It took several minutes to save and the new size was 24.5 MB. It tried to do the same steps again to increase the size, but the size remained the same. The opening and saving time are absolutely disastrous.

To fix the problem, I opened the same RTF file (24.5 MB) with Wordpad and then I saved. The file size dropped to 22 KB. I reopened it with Writer and I saved it. So far, I have a stable file size around 75 KB. I believe there is still a problem with the saving. Just for adding about 20 carriage returns at the end of the document, the file size jumped from 72 KB to 83 KB. The document has about 20 pages of texts where the pages are 50% to 75% filled.

I did a test with the document of 24.5 MB. I selected all the text and I did “cut” (Ctrl + X). I saved the document and the file size went back to a few KB. The normal size, in other words. After that I pasted the cut text with Ctrl + V. It took several minutes to paste the 20 pages of text that I had. After that, the file size returned to the size of 24.5 MB.

It looks like there is extra content saved, but not visible. This content appears to be in the text itself and is copied and pasted to the clipboard based on my test above. For example, a page of text with only regular and bold texts like in my test seems has an enormous file size.


By the way, my computer has an SSD drive and has an i9 Core Intel processor. It is really performing.

The problem does not seem to appear all the time. However, it might be still present all the time. Just for adding a few lines to update my documents from time to time, I often end up with RTF documents of several MB. At that point, I notice the problem because it takes several minutes just to open the document. 

Steps to Reproduce:
Create a simple RTF document with normal text and few bold titles. Update the document by adding content regularly. I have just tried to create a new RTF document and do the same steps, but it does not seem to increase the file size out of scope. The bug is not always there, but it eventually, it will appear.

Many documents were updated by removing the older parts and adding new text. However, I had this bug with document where I did not remove older parts.

Actual Results:
It varies, but sometimes, the file size can be something like a thousand times bigger and it takes several minutes to open and save an RTF documents.

Expected Results:
Normal file size.


Reproducible: Sometimes


User Profile Reset: No



Additional Info:
Comment 1 Julien Nabet 2020-01-05 12:57:10 UTC
On pc Debian x86-64 with LO Debian package 6.3.4, I don't reproduce this.

Could you upgrade to 6.3.4 and if you reproduce this, provide minimal step by step process to reproduce this from a brand new file?
Comment 2 Dieter 2020-01-05 13:06:51 UTC
(In reply to Julien Nabet from comment #1)
> On pc Debian x86-64 with LO Debian package 6.3.4, I don't reproduce this.
> 
> Could you upgrade to 6.3.4 and if you reproduce this, provide minimal step
> by step process to reproduce this from a brand new file?

Sebastien, I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the steps are provided and it is still reproducible for aou in LO 6.3.4.
Comment 3 Sebastien 2020-01-06 14:42:04 UTC
Created attachment 156965 [details]
Test file for issue - LibreOffice Writer saves RTF documents with enormous file size (1000 times bigger)
Comment 4 Sebastien 2020-01-06 14:44:51 UTC
Finally, I have been able to reproduce the bug from one the file that I had. I had a hard time to reproduce it. There is no way that I can tell you how to reproduce it from scratch. The only way so far for me is to download the test file “TestFile01 - F.rtf” that I attached with this report and to follow the steps below like I did.

- The file “TestFile01 - F.rtf” has 1,477 KB.
- I open the file with Writer (I double-click on the file).
- I do Ctrl + S to save without adding any content.
- I close Writer.
- The new file size is now 2,980 KB.
- I do the same steps again.
- The new file size is now 6,014 KB.

It keeps growing every time I open the file and save it.

I cannot tell if this is related to a specific character or what. I have the feeling that Writer introduces invisible stuff from time to time. Writer has the frustrating habit of reformatting things automatically.

The test file is about half a page of text and has 49 lines when I open it with Notepad++. I noticed the line 43 has many sub-lines like:
__UnoMark__125_1522762609266444}{\*\bkmkstart __UnoMark__125_1522762609212232444}{\*\bkmkstart __UnoMark__125_1522762609212242444}{\*\bkmkstart __UnoMark__125_15227626092613444}{\*\bkmkstart”

I cannot tell what this is. Every time I saved the file, the line 43 grew up with this kind of thing. I believe the problem is there.

If I open the file with Wordpad and then save. The problem disappears and I’m not able to reproduce the bug. The line 43 with all the garbage disappears. As mentioned, please, use my test file. Otherwise, the hard part with this bug was to introduce the garbage in the file. This bug has been recurring for years for me. I noticed the problem from time to time and the behavior was somehow random.

Let me know if I can test something else.
Comment 5 Julien Nabet 2020-01-06 20:11:27 UTC
Perhaps an old LO version corrupted the rtf in some way.
But if using Wordpad allows you to clean your file and last LO version doesn't re-corrupt it, it should be ok then, shouldn't it?
Comment 6 Sebastien 2020-01-08 19:12:26 UTC
I will verify if that problem occurs again with clean files. That problem is quite hard to reproduce. I’m going to save all the files properly with Wordpad to make some kind reset.

I understand that you are probably not going to fix that problem right now. The bug continues to be present because Writer does not save properly. Normally, it shouldn’t save the corrupted part.

I did some research on the web about this issue before posting my problem here. It appears that someone had this kind of problem with the ODT format.

For the moment, I let you investigate the problem if you think it’s worth your time. Most people would expect the software to not carry the bug and be able to open older corrupted files and save them again without the corrupted part (which is about twice the original size). If I ever find another issue with this on a clean file in Writer, I’ll let you know by adding a note to this record.
Comment 7 Julien Nabet 2020-01-08 19:50:40 UTC
(In reply to Sebastien from comment #6)
> ...
> 
> I did some research on the web about this issue before posting my problem
> here. It appears that someone had this kind of problem with the ODT format.
It can happen if you enable "Track changes"

> 
> For the moment, I let you investigate the problem if you think it’s worth
> your time. Most people would expect the software to not carry the bug and be
> able to open older corrupted files and save them again without the corrupted
> part (which is about twice the original size). If I ever find another issue
> with this on a clean file in Writer, I’ll let you know by adding a note to
> this record.

Miklos: since it concerns RTF, thought you might have some opinion here.
Do you think it should be set aside while we don't have a step by step process to reproduce this from a brand new file or do you think LO should be able to "de-corrupt" an RTF file?
Comment 8 Miklos Vajna 2020-01-09 07:40:28 UTC
If it can be reproduced with the attached bugdoc that we add new bookmarks to the export result in every save, that's a valid bugreport, I would way.

I would expect UNO marks never end up in the export result, FWIW.
Comment 9 Julien Nabet 2020-01-09 13:59:03 UTC
Thank you Miklos for your feedback.
If you think it's valid, let's put this to NEW then.
"__UnoMark__" is in UnoMark ctr
(see https://opengrok.libreoffice.org/xref/core/sw/source/core/crsr/bookmrk.cxx?r=7e403195#345)
Comment 10 BogdanB 2020-09-28 06:00:18 UTC
It's double on every Save of the document. But the document have the same one page.

Tested on
Version: 7.0.1.2
Build ID: 7cbcfc562f6eb6708b5ff7d7397325de9e764452
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-US (ro_RO.UTF-8); UI: en-US
Calc: threaded
Comment 11 QA Administrators 2022-09-29 03:54:25 UTC Comment hidden (obsolete)
Comment 12 Roman Kuznetsov 2022-10-02 18:05:57 UTC
Yep, still repro the problem

Open the attached file and save it with another name -> double size compared original file
Open than another file and save it with third name -> double size compared second file
Etc.

Version: 7.5.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 48b9cbc742de3f6120986cb6cafc92eb5009da82
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded