Created attachment 58807 [details] Sample file to recreate the problematic document Today a coworker has called me because LibreOffice took forever to open a doc document (Word 97 format). It was not locked but simply very very very slow in the "Importing document..." phase. Really unusable: i had to kill LO. It is about 22 MB size document. That PC is a Windows XP SP3 that today i have upgraded from go-oo 2.4.1 to LO 3.5.1. That document opened without that huge slowness with go-oo. I tried to open the same document with my LO 3.5.1 (Win2000 SP4) and i waited until the end: 45 minutes! And after that time the CPU stayed at 100% and never falled down, rendering the application very unresponsive. Riproducible under: 3.5.1rc1 (Win7 SP1), 3.5.0 (Debian Linux), 3.5.1 (Debian Linux). Not reproducible under: LO 3.4.3 (WinXP SP3) and under OOo 3.2.1 (Win2000 SP4). So seems to be a 3.5.x regression. Since that document contains sensible data and since that document was generated by me some months ago, i tried to regenerate a new one with the following steps: - an application generated a document like the one attached (obscured by changing every letter with X, so same file format, same line length, ecc). - copy/paste the entire content inside a fresh new LibreOffice Writer document. - changed the page format in landscape, and the size in A4. - saved in doc format (word 97/2000/xp/2003). Try to follow yourself the steps above and you'll notice that: - After the copy/paste the CPU will go to 100% and stays here forever (or very very long). - Notice the slow saving phase. - Try to open the doc document you generated and wait, wait, wait... Cesare.
Confirmed. (Windows XP) Works fine with LibO 3.3.4 and LibO 3.4.5. Very slow with LibO 3.5.1, especially loading the created DOC-file: seconds with LibO 3.3.4 and 3.4.5, several minutes with 100% CPU load with LibO 3.5.1. This is a regression.
Still present in 3.5.2rc1 (tested under Win2000).
Still present in 3.5.2 (tested under Win2000 and Debian). To your convenience i've attached a .doc copy of the txt sample i've attached in the first message, so you can directly test the huge page load time. There's also a .odt copy (see below). Some notes not expressed in the first message. - A recent MS Office (i don't remember for sure the version, but i think 2007), opens the generated doc file quite fastly. - I don't remember under Windows, but under Linux if i save the file in odt, i can reopen it fastly. - Under Debian, the copy/paste phase starts fast but slow down during the load: it took about 5-6 minutes with my 5 years old Centrino. - The ram usage seems not to vary significantly during the load. - During the copy/paste, LO interface is completely unresponsive: if you switch to another application then switch back, the user interface doesn't repaint either. - The CPU activity after the load phase falled down but keep on floating around 24-50% for about another 4-5 minutes. This is the same also after loading the odt copy. - Opening the doc document, LO interface looks reacting to user commands (you can open menus), but seems to ignore any close command. If you want to interrupt the load phase, you have to kill LO by other means. Cesare.
Created attachment 59616 [details] Doc copy (97-2000-XP) of the original txt sample
Created attachment 59617 [details] Odt copy of the original txt sample
//probably not as inefficient as it looks famous last words
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=d726281e9020ebaddfdf6659ecfe7a0454014dff Related: fdo#47644 compound storage backend is poor at knowing stream size
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=cdcf2fcefcf05d25411d1cce685f410256ff46cf Related: fdo#47644 for huge files build a page chain cache for seeks
The substantive issue is the crazy-slow .doc load. Which is taken care of in master for 3-6. Will cook up a backport for 3-5 for submission.
Caolan McNamara committed a patch related to this issue. It has been pushed to "libreoffice-3-5": http://cgit.freedesktop.org/libreoffice/core/commit/?id=aef4d0ea8ba7f5e169dc04812490a103cd858f0c&g=libreoffice-3-5 Resolves: fdo#47644 compound storage backend is poor at knowing stream size It will be available in LibreOffice 3.5.4.
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=75e7643e22bcf674739ca890bfc06f80e872624b Related: fdo#47644 get out of bounds conditions right
Caolan McNamara committed a patch related to this issue. It has been pushed to "libreoffice-3-5": http://cgit.freedesktop.org/libreoffice/core/commit/?id=e7c861e3ec9398436eda16bb748af22a2d9afc27&g=libreoffice-3-5 Related: fdo#47644 for huge files build a page chain cache for seeks It will be available in LibreOffice 3.5.4.
Caolan McNamara committed a patch related to this issue. It has been pushed to "libreoffice-3-5": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a6d590a0794fc00b3acb66bee9e1cd85c85ccc7&g=libreoffice-3-5 Revert "Related: fdo#47644 for huge files build a page chain cache for seeks" It will be available in LibreOffice 3.5.4.
I'll split the very slow, no-progress-bar, synchronous, layout-on-load part of this bug to another issue...
Made summary a bit more specific, to make it easier to find this bug (fixed, but still interesting ;-).
Verified with LOdev 3.6 (master - 18-May-2012 02h44 x86@6-fast; Build ID: 8b1d29b) under Windows Vista 64.
First of all thank you very much to Caolan for taking care of this bug. To me the problem is half solved because the loading time is now greatly improved but still persists the high CPU load that render the application somewhat unusable on my not-so-new machines. And now also unstable. So i've reopened the bug. I've just installed LO 3.5.4rc1 on two PCs and made some test. Following my results trying to load "test-bug.doc.zip" present in this bug report. In order you can see: start time, time when the document is fully loaded and visible, time when the CPU load goes down. PC1: Celeron 2.4 MHz, 1GB ram, Win2000 pro SP4 1) Start 14:44:00 2) Doc ok 14:45:11 3) CPU ok ??? (after 15:10) PC2: Celeron M 430 1.7 MHz, 1500 MB, WinXP SP3 1) Start 14:31:00 2) Doc ok 14:31:25 3) CPU ok 14:31:32 PC1 is older (and slower) than PC2. As you can see the load time is now fine in both cases and is quite deterministic. The problem is that after the doc is loaded there are responsiveness issues: LibreOffice excutes backgroud activities that heavily impact on usability on non-recent PCs. On the first PC (slower), after the document is loaded, the cpu stays at 100% for an unreasonable amount of time. On PC2 there were less predicible results: the common case is that after the document loads, the CPU goes down rapidly but not near zero, but at about 50-30%. Then after another (not small) times it goes around zero whereas other times it jumped up around 100% again and stayed here forever (cpu graph irregular). But the results weren't always the same. Another important thing is that doing these tests i've experienced many crash trying to interact with the document when the cpu was high, on both PCs, in particular trying to scroll. Sometimes only clicking inside the document page when the cpu was high made LO crash. Don't know if the instability is specific to this problem or caused by some other bug in other places. I've attached two images showing CPU activities on PC2 trying to load "test-bug.doc.zip": you can see that after the loading on img1 the cpu had a continuous irregular behaviour; on img2 the cpu had an irregular but higher trend followed by very high cpu activities (that never falled down). I don't know how i can provide data more useful to you. In case let me know. Cesare.
Created attachment 61918 [details] WinXP task manager (1) Task manager activity (1) on WinXP with patched LO
Created attachment 61919 [details] WinXP task manager (2) Task manager activity (2) on WinXP with patched LO
I can confirm that the Fileopen fix works fine for "Doc copy (97-2000-XP) of the original txt sample" @BerniCL <https://wiki.documentfoundation.org/BugReport_Details#How_to_reopen_Bugs> The fileopen problem is fixed, please file a new bug with reference to documents here for the additional problem that was more or less invisible as long as the FILEOPEN problem was unfixed and it was more or less impossible to open the document Before you submit a new bug please check whether your new problem is already known. Please add me to CC in the new Bug.
BTW this seems to be an old Master bug already visible in WIN Master 2011-09-06, that version also needs endless time for import, while 3.4.5 opened "Doc copy (97-2000-XP) of the original txt sample" within half a minute or so.
Opens in around 8 seconds from console/warm-start for me, vs. 600+ seconds beforehand - hope that suits ;-)
Yes, good progress, for me Master for FILEOPEN is faster than MS WORD Viewer and AOOo 3.4. But AOOo does not suffer from additional Bug (Comment 17), what seems to more "big document" related than related to MS .doc
For problem in Comment 17 I added a sample document in "48932 - EDITING: Writer 3.5.2 slow typing in large documents"
JFYI The opening speed improvements are really impressive and noticeable also on small documents under not so new machines. As a test, here is the opening time i've measured using "Doc copy (97-2000-XP) of the original txt sample" (attached to this bug) under Debian Linux and with a Pentium M 725 (2005): LibO 3.5.3: after 28 minutes i've gave up (opening advancement still under 50%). LibO 3.5.4rc1: 34 seconds Really good job guys! I look forward to test Caolan's corrections regarding #48932.