Bug 47644 - FILEOPEN: Huge load time and CPU load with .doc file
Summary: FILEOPEN: Huge load time and CPU load with .doc file
Status: CLOSED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Master old -3.6
Hardware: Other All
: medium normal
Assignee: Caolán McNamara
URL:
Whiteboard: target:3.6.0 target:3.5.4
Keywords: regression
Depends on:
Blocks: mab3.5
  Show dependency treegraph
 
Reported: 2012-03-21 05:20 UTC by BerniCL
Modified: 2012-05-24 01:18 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample file to recreate the problematic document (353.87 KB, application/zip)
2012-03-21 05:20 UTC, BerniCL
Details
Doc copy (97-2000-XP) of the original txt sample (837.12 KB, application/octect-stream)
2012-04-07 04:22 UTC, BerniCL
Details
Odt copy of the original txt sample (492.14 KB, application/vnd.oasis.opendocument.text)
2012-04-07 04:23 UTC, BerniCL
Details
WinXP task manager (1) (99.49 KB, image/jpeg)
2012-05-21 07:33 UTC, BerniCL
Details
WinXP task manager (2) (96.13 KB, image/jpeg)
2012-05-21 07:34 UTC, BerniCL
Details

Note You need to log in before you can comment on or make changes to this bug.
Description BerniCL 2012-03-21 05:20:21 UTC
Created attachment 58807 [details]
Sample file to recreate the problematic document

Today a coworker has called me because LibreOffice took forever to open a doc document (Word 97 format). It was not locked but simply very very very slow in the "Importing document..." phase. Really unusable: i had to kill LO.
It is about 22 MB size document.

That PC is a Windows XP SP3 that today i have upgraded from go-oo 2.4.1 to LO 3.5.1. That document opened without that huge slowness with go-oo.

I tried to open the same document with my LO 3.5.1 (Win2000 SP4) and i waited until the end: 45 minutes! And after that time the CPU stayed at 100% and never falled down, rendering the application very unresponsive.

Riproducible under: 3.5.1rc1 (Win7 SP1), 3.5.0 (Debian Linux), 3.5.1 (Debian Linux).

Not reproducible under: LO 3.4.3 (WinXP SP3) and under OOo 3.2.1 (Win2000 SP4).

So seems to be a 3.5.x regression.

Since that document contains sensible data and since that document was generated by me some months ago, i tried to regenerate a new one with the following steps:
- an application generated a document like the one attached (obscured by changing every letter with X, so same file format, same line length, ecc).
- copy/paste the entire content inside a fresh new LibreOffice Writer document.
- changed the page format in landscape, and the size in A4.
- saved in doc format (word 97/2000/xp/2003).

Try to follow yourself the steps above and you'll notice that:
- After the copy/paste the CPU will go to 100% and stays here forever (or very very long).
- Notice the slow saving phase.
- Try to open the doc document you generated and wait, wait, wait...

Cesare.
Comment 1 s-joyemusequna 2012-03-21 11:15:15 UTC
Confirmed. (Windows XP)

Works fine with LibO 3.3.4 and LibO 3.4.5.

Very slow with LibO 3.5.1, especially loading the created DOC-file: seconds with LibO 3.3.4 and 3.4.5, several minutes with 100% CPU load with LibO 3.5.1.

This is a regression.
Comment 2 BerniCL 2012-03-27 07:08:41 UTC
Still present in 3.5.2rc1 (tested under Win2000).
Comment 3 BerniCL 2012-04-07 04:18:46 UTC
Still present in 3.5.2 (tested under Win2000 and Debian).

To your convenience i've attached a .doc copy of the txt sample i've attached in the first message, so you can directly test the huge page load time.
There's also a .odt copy (see below).

Some notes not expressed in the first message.
- A recent MS Office (i don't remember for sure the version, but i think 2007), opens the generated doc file quite fastly.
- I don't remember under Windows, but under Linux if i save the file in odt, i can reopen it fastly.
- Under Debian, the copy/paste phase starts fast but slow down during the load: it took about 5-6 minutes with my 5 years old Centrino.
- The ram usage seems not to vary significantly during the load.
- During the copy/paste, LO interface is completely unresponsive: if you switch to another application then switch back, the user interface doesn't repaint either.
- The CPU activity after the load phase falled down but keep on floating around 24-50% for about another 4-5 minutes. This is the same also after loading the odt copy.
- Opening the doc document, LO interface looks reacting to user commands (you can open menus), but seems to ignore any close command. If you want to interrupt the load phase, you have to kill LO by other means.

Cesare.
Comment 4 BerniCL 2012-04-07 04:22:12 UTC
Created attachment 59616 [details]
Doc copy (97-2000-XP) of the original txt sample
Comment 5 BerniCL 2012-04-07 04:23:24 UTC
Created attachment 59617 [details]
Odt copy of the original txt sample
Comment 6 Caolán McNamara 2012-05-02 14:21:45 UTC
//probably not as inefficient as it looks
famous last words
Comment 7 Not Assigned 2012-05-02 16:11:21 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d726281e9020ebaddfdf6659ecfe7a0454014dff

Related: fdo#47644 compound storage backend is poor at knowing stream size
Comment 8 Not Assigned 2012-05-03 08:31:15 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=cdcf2fcefcf05d25411d1cce685f410256ff46cf

Related: fdo#47644 for huge files build a page chain cache for seeks
Comment 9 Caolán McNamara 2012-05-05 05:36:23 UTC
The substantive issue is the crazy-slow .doc load. Which is taken care of in master for 3-6. Will cook up a backport for 3-5 for submission.
Comment 10 Not Assigned 2012-05-10 09:03:55 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-3-5":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=aef4d0ea8ba7f5e169dc04812490a103cd858f0c&g=libreoffice-3-5

Resolves: fdo#47644 compound storage backend is poor at knowing stream size


It will be available in LibreOffice 3.5.4.
Comment 11 Not Assigned 2012-05-10 13:52:36 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=75e7643e22bcf674739ca890bfc06f80e872624b

Related: fdo#47644 get out of bounds conditions right
Comment 12 Not Assigned 2012-05-11 08:37:09 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-3-5":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e7c861e3ec9398436eda16bb748af22a2d9afc27&g=libreoffice-3-5

Related: fdo#47644 for huge files build a page chain cache for seeks


It will be available in LibreOffice 3.5.4.
Comment 13 Not Assigned 2012-05-11 08:43:57 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-3-5":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a6d590a0794fc00b3acb66bee9e1cd85c85ccc7&g=libreoffice-3-5

Revert "Related: fdo#47644 for huge files build a page chain cache for seeks"


It will be available in LibreOffice 3.5.4.
Comment 14 Michael Meeks 2012-05-14 03:20:43 UTC
I'll split the very slow, no-progress-bar, synchronous, layout-on-load part of this bug to another issue...
Comment 15 Roman Eisele 2012-05-14 06:46:25 UTC
Made summary a bit more specific, to make it easier to find this bug (fixed, but still interesting ;-).
Comment 16 s-joyemusequna 2012-05-19 09:45:30 UTC
Verified with LOdev 3.6 (master - 18-May-2012 02h44 x86@6-fast; Build ID: 8b1d29b) under Windows Vista 64.
Comment 17 BerniCL 2012-05-21 07:31:54 UTC
First of all thank you very much to Caolan for taking care of this bug.

To me the problem is half solved because the loading time is now greatly improved but still persists the high CPU load that render the application somewhat unusable on my not-so-new machines. And now also unstable. So i've reopened the bug.

I've just installed LO 3.5.4rc1 on two PCs and made some test. Following my results trying to load "test-bug.doc.zip" present in this bug report. In order you can see: start time, time when the document is fully loaded and visible, time when the CPU load goes down.

PC1: Celeron 2.4 MHz, 1GB ram, Win2000 pro SP4
1) Start   14:44:00
2) Doc ok  14:45:11
3) CPU ok  ???  (after 15:10)

PC2: Celeron M 430 1.7 MHz, 1500 MB, WinXP SP3
1) Start   14:31:00
2) Doc ok  14:31:25
3) CPU ok  14:31:32

PC1 is older (and slower) than PC2.
As you can see the load time is now fine in both cases and is quite deterministic. The problem is that after the doc is loaded there are responsiveness issues: LibreOffice excutes backgroud activities that heavily impact on usability on non-recent PCs. On the first PC (slower), after the document is loaded, the cpu stays at 100% for an unreasonable amount of time. On PC2 there were less predicible results: the common case is that after the document loads, the CPU goes down rapidly but not near zero, but at about 50-30%. Then after another (not small) times it goes around zero whereas other times it jumped up around 100% again and stayed here forever (cpu graph irregular).
But the results weren't always the same.

Another important thing is that doing these tests i've experienced many crash trying to interact with the document when the cpu was high, on both PCs, in particular trying to scroll. Sometimes only clicking inside the document page when the cpu was high made LO crash.
Don't know if the instability is specific to this problem or caused by some other bug in other places.

I've attached two images showing CPU activities on PC2 trying to load "test-bug.doc.zip": you can see that after the loading on img1 the cpu had a continuous irregular behaviour; on img2 the cpu had an irregular but higher trend followed by very high cpu activities (that never falled down).

I don't know how i can provide data more useful to you. In case let me know.

Cesare.
Comment 18 BerniCL 2012-05-21 07:33:48 UTC
Created attachment 61918 [details]
WinXP task manager (1)

Task manager activity (1) on WinXP with patched LO
Comment 19 BerniCL 2012-05-21 07:34:27 UTC
Created attachment 61919 [details]
WinXP task manager (2)

Task manager activity (2) on WinXP with patched LO
Comment 20 Rainer Bielefeld Retired 2012-05-21 08:33:29 UTC
I can confirm that the Fileopen fix works fine for "Doc copy (97-2000-XP) of the original txt sample" 

@BerniCL
<https://wiki.documentfoundation.org/BugReport_Details#How_to_reopen_Bugs>

The fileopen problem is fixed, please file a new bug with reference to documents here for the additional problem that was more or less invisible as long as the FILEOPEN problem was unfixed and it was more or less impossible to open the document

Before you submit a new bug please check whether your new problem is already known.

Please add me to CC in the new Bug.
Comment 21 Rainer Bielefeld Retired 2012-05-21 08:45:08 UTC
BTW this seems to be an old Master bug already visible in WIN Master 2011-09-06, that version also needs endless time for import, while 3.4.5 opened "Doc copy (97-2000-XP) of the original txt sample" within half a minute or so.
Comment 22 Michael Meeks 2012-05-21 09:04:28 UTC
Opens in around 8 seconds from console/warm-start for me, vs. 600+ seconds beforehand - hope that suits ;-)
Comment 23 Rainer Bielefeld Retired 2012-05-21 09:33:20 UTC
Yes, good progress, for me Master for FILEOPEN is faster  than MS WORD Viewer and AOOo 3.4. But AOOo does not suffer from additional Bug (Comment 17), what seems to more "big document" related than related to MS .doc
Comment 24 Rainer Bielefeld Retired 2012-05-21 10:31:54 UTC
For problem in Comment 17 I added a sample document in "48932 - EDITING: Writer 3.5.2 slow typing in large documents"
Comment 25 BerniCL 2012-05-24 01:18:47 UTC
JFYI

The opening speed improvements are really impressive and noticeable also on small documents under not so new machines.

As a test, here is the opening time i've measured using "Doc copy (97-2000-XP) of the original txt sample" (attached to this bug) under Debian Linux and with a Pentium M 725 (2005):
LibO 3.5.3: after 28 minutes i've gave up (opening advancement still under 50%).
LibO 3.5.4rc1: 34 seconds

Really good job guys!

I look forward to test Caolan's corrections regarding #48932.