Bug 72884 - FILEOPEN: slower loading of specific .DOC files compared to LibO 3.6.0
Summary: FILEOPEN: slower loading of specific .DOC files compared to LibO 3.6.0
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.1.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA NoRepro:4.3.0.0.alpha0+2013-12-19...
Keywords: filter:doc, perf, regression
Depends on:
Blocks: DOC
  Show dependency treegraph
 
Reported: 2013-12-19 17:02 UTC by Witalik
Modified: 2017-09-21 21:39 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
the document which needs to be checked, and on its basis to improve opening of the doc format of files (1.06 MB, application/msword)
2013-12-19 17:02 UTC, Witalik
Details
the document (6 pages) which needs to be checked, and on its basis to improve opening of the doc format of files (139.00 KB, application/msword)
2013-12-30 16:59 UTC, Witalik
Details
attachment 90994_resaved (565.50 KB, application/msword)
2017-01-08 08:43 UTC, tommy27
Details
attachment 91339_resaved (99.50 KB, application/msword)
2017-01-08 08:44 UTC, tommy27
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Witalik 2013-12-19 17:02:57 UTC
Created attachment 90994 [details]
the document which needs to be checked, and on its basis to improve opening of the doc format of files

Problem description: 
the document is attached

Steps to reproduce:
1. open the attached document and look at opening time
2. opening time five minutes

Current behavior:
opening time five minutes

Expected behavior:
has to open within several seconds

Operating System: Windows XP
Version: 4.3.0.0.alpha0+ Master
Comment 1 Robinson Tryon (qubit) 2013-12-20 13:37:52 UTC
TESTING on Ubuntu 12.04.3 with

LibreOffice Version: 4.3.0.0.alpha0+
Build ID: 164a8c409c2f070ee51ca4258585cf0c8579af51
TinderBox: Linux-rpm_deb-x86_64@46-TDF
Branch:master
Time: 2013-12-19_00:13:56

(In reply to comment #0)
> Steps to reproduce:
> 1. open the attached document and look at opening time
> 2. opening time five minutes
> 
> Current behavior:
> opening time five minutes

NOREPRO: Document opened for me in about 10 seconds.


---

Joren - Please test on Windows
Comment 2 Witalik 2013-12-20 13:45:59 UTC
you write: TESTING on Ubuntu 12.04.3 with
My Platform: Windows XP SP3 with last services pack, not linux
Comment 3 Robinson Tryon (qubit) 2013-12-20 13:57:03 UTC
(In reply to comment #2)
> you write: TESTING on Ubuntu 12.04.3 with
> My Platform: Windows XP SP3 with last services pack, not linux

I don't currently have a WinXP machine or VM here, so I tested on the system available to me and then cc'd one of the other QA Team members to help test on Windows: "Joren - Please test on Windows"

Cheers,
Comment 4 pierre-yves samyn 2013-12-21 11:36:22 UTC
Hello

Test on windows 7/64
Proc: Intel(R) Core(TM)2 Duo CPU 3.06GHz
RAM: 4.00 Go

Version: 4.2.0.1
Build ID: 7bf567613a536ded11709b952950c9e8f7181a4a

Version: 4.3.0.0.alpha0+
Build ID: f279acd3678d014d9d5dafe41971e0da4dec7b6c
TinderBox: Win-x86@47-TDF, Branch:master, Time: 2013-12-13_23:25:16

Opening from scratch: 75 sec.
(same with 4.2 & 4.3)

Opening a document in doc format needs a conversion. 
The same document saved in odt format opens in a second on my platform.

Regards
Pierre-Yves
Comment 5 retired 2013-12-21 15:32:14 UTC
21 seconds to open on OS X 10.9, LO Version: 4.3.0.0.alpha0+
Build ID: 164a8c409c2f070ee51ca4258585cf0c8579af51
TinderBox: MacOSX-x86@49-TDF, Branch:master, Time: 2013-12-19_00:12:55

Setting to new since this is indeed long. Can we get some dev input if this is expected and NOTABUG or if something needs to be fixed.

-> NEW + OS: All.
Comment 6 Robinson Tryon (qubit) 2013-12-21 18:33:14 UTC
Tagging as a performance-related issue.
Whiteboard: perf
Comment 7 Witalik 2013-12-30 16:59:23 UTC
Created attachment 91339 [details]
the document (6 pages) which needs to be checked, and on its basis to improve opening of the doc format of files

Similar problem.
6 pages which come off seconds 16.
Comment 8 Dennis Roczek 2014-01-23 16:35:11 UTC
I can confirm: the first doc is loading on my win7 64bit i7, but takes 7min to open with LibO4.1.3.2)

So as it seems, that there is already some work done on the performance. But in my eyes not enough - at least it just takes too long.
Comment 9 Robinson Tryon (qubit) 2014-02-03 19:16:55 UTC
(In reply to comment #8)
> I can confirm: the first doc is loading on my win7 64bit i7, but takes 7min
> to open with LibO4.1.3.2)

Ouch -- yes, sounds like a pretty classic performance issue.

Dennis/Witalik - Could you please test with older versions (4.0, 3.6, 3.5, etc..) and see if the performance problems are a regression?

Thanks!

Whiteboard: (removing NeedAdvice)
Comment 10 tommy27 2014-02-20 06:22:31 UTC
tested attachment 91339 [details] with older releases under Win7x64

LibO 3.3.3 --> 3.6.7 takes 15 seconds to open (which is suboptimal too)

LibO 4.0.4 --> 4.1.5 take 32 seconds to open

probably a performamce regression started in the 4.0.x development

(*) times to load may vary according to the power of the PC that was used.
mine was a 5 years old laptop.

I add Micheal Meeks to CC list since he's one of the "performance" experts.
Comment 11 Michael Meeks 2014-02-20 09:48:22 UTC
As with all of these we need a callgrind trace of this opening under Linux with a build with debugging symbols installed thus:

export OOO_EXIT_POST_STARTUP=1
export OOO_DISABLE_RECOVERY=1
valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 <my-test-file.doc>

zip and provide the callgrind.12345.txt file - hopefully the slowness jumps out of that =)

Thanks
Comment 12 QA Administrators 2014-09-03 21:32:44 UTC Comment hidden (obsolete)
Comment 13 Matthew Francis 2014-09-04 15:25:22 UTC
Unsure if this is sufficient, but according to a previous comment this couldn't in any case be reproduced in Linux.

The following is a callgrind log from loading the second smaller document on OSX, where I can reproduce the issue on 4.4 master (external link due to large size):

https://docs.google.com/uc?id=0B_soWPNbBZEVVmMxSW1KOVdOS3M&export=download
Comment 14 Matthew Francis 2014-09-04 19:23:16 UTC
Possibly also of interest: For the second file (IDPZK_MODUL_2.doc),

Save -> Reload -> Issue still present (so whatever the problematic formatting is, it appears to survive a round trip through .doc)

Save as ODT -> Reload -> Save as .doc -> Reload -> Issue still present (ditto round trip through ODT, although the issue does not occur when loading from ODT)

Set all text to one language -> Save -> Reload -> Issue no longer present



Looking at the content.xml of the file as an ODT,

        <style:style style:name="T17" style:family="text">
            <style:text-properties fo:color="#000000" style:font-name="Tahoma" fo:font-size="8pt" fo:background-color="#ffffff" loext:char-shading-value="0" style:font-size-asian="8pt" style:font-name-complex="Tahoma" style:font-size-complex="8pt"/>
        </style:style>

...

        <style:style style:name="T22" style:family="text">
            <style:text-properties fo:color="#000000" style:font-name="Tahoma" fo:font-size="8pt" fo:language="en" fo:country="US" fo:background-color="#ffffff" loext:char-shading-value="0" style:font-size-asian="8pt" style:font-name-complex="Tahoma" style:font-size-complex="8pt"/>
        </style:style>

...

            <text:p text:style-name="P2">5.Салічна правда як правова памятка Франкської держави:загальна характеристика,регулювання речових прав.</text:p>
            <text:p text:style-name="Standard">
                <text:span text:style-name="T17">Салічна</text:span>
                <text:span text:style-name="T22"> </text:span>
                <text:span text:style-name="T17">правда</text:span>
                <text:span text:style-name="T22">-</text:span>
                <text:span text:style-name="T17">це</text:span>
                <text:span text:style-name="T22"> </text:span>
                <text:span text:style-name="T17">збірник</text:span>
                <text:span text:style-name="T22"> </text:span>
                <text:span text:style-name="T17">записів</text:span>
                <text:span text:style-name="T22"> </text:span>
                <text:span text:style-name="T17">звичаєвого</text:span>
                <text:span text:style-name="T22"> </text:span>
                <text:span text:style-name="T17">права</text:span>
                <text:span text:style-name="T22"> </text:span>

And so on. There appear to be long paragraphs of text where the language flips to EN and back between every word. Whether or not it's specifically language, it seems plausible that the slow loading speed may have something to do with this property dance.



Setting the language doesn't seem to do anything to the larger document, so there may be something else going on there.
Comment 15 Matthew Francis 2014-09-05 05:33:46 UTC
Results of similar analysis of the larger document:

Roundtrip by Save in LO (as .doc or .docx) -> Reopen -> Issue not present

Roundtrip by Open in Word (Mac, 2011) -> Save as .docx -> Reopen in Word -> Save as .doc -> Reopen in LO -> Issue still present


Thus, comparing the OOXML saved by LO and Word to find the difference, it is quickly apparent that where LO writes in document.xml:

        <w:p>
            <w:pPr>
                <w:pStyle w:val="Normal"/>
                <w:spacing w:lineRule="auto" w:line="360"/>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:color w:val="000000"/>
                    <w:sz w:val="28"/>
                    <w:szCs w:val="28"/>
                    <w:lang w:val="uk-UA"/>
                </w:rPr>
                <w:t>Рoздiл 1. Прaвoвi зaсaди тa мeхaнiзми мирoтвoрчoї дiяльнoстi ЛAД..............12</w:t>
            </w:r>
        </w:p>

The same line of text written by Word instead goes:

        <w:p w:rsidR="000137C5" w:rsidRPr="004F22BB" w:rsidRDefault="000137C5" w:rsidP="000137C5">
            <w:pPr>
                <w:spacing w:line="360" w:lineRule="auto"/>
                <w:rPr>
                    <w:color w:val="000000"/>
                    <w:sz w:val="28"/>
                    <w:szCs w:val="28"/>
                    <w:lang w:val="uk-UA"/>
                </w:rPr>
            </w:pPr>
            <w:r w:rsidRPr="004F22BB">
                <w:rPr>
                    <w:color w:val="000000"/>
                    <w:sz w:val="28"/>
                    <w:szCs w:val="28"/>
                    <w:lang w:val="uk-UA"/>
                </w:rPr>
                <w:t>Р</w:t>
            </w:r>
            <w:r w:rsidR="0004393B" w:rsidRPr="004F22BB">
                <w:rPr>
                    <w:color w:val="000000"/>
                    <w:sz w:val="28"/>
                    <w:szCs w:val="28"/>
                    <w:lang w:val="uk-UA"/>
                </w:rPr>
                <w:t>o</w:t>
            </w:r>
            <w:r w:rsidRPr="004F22BB">
                <w:rPr>

                ...

...for a total of 42 <w:r> for one line of text, each with one or only a few characters in it. LO appears to be merging these segments on load, while Word does not.

In total, in the version of the OOXML saved by LO there are 1102 <w:r>, while in the Word version there are 76028.
Comment 16 Robinson Tryon (qubit) 2015-12-09 18:32:49 UTC Comment hidden (obsolete)
Comment 17 MM 2017-01-01 14:26:37 UTC
With Version: 5.2.4.2 (x64)
Build ID: 3d5603e1122f0f102b62521720ab13a38a4e0eb0
CPU Threads: 2; OS Version: Windows 6.19; UI Render: default; 
Locale: en-US (en_US); Calc: single

the first example loads in about 20-25 secs and the second one in about 10 secs.
Could be faster (?!), but it's certainly not 5-7 mins anymore.
Comment 18 tommy27 2017-01-03 09:23:32 UTC Comment hidden (obsolete)
Comment 19 tommy27 2017-01-03 14:11:46 UTC
more extensive performance tests about loading time of:
attachment 90994 [details] 
attachment 91339 [details] 

....................
3.5.7.2 and 3.6.0.4
....................
7 seconds
3 seconds

....................
3.6.1.2 -> 3.6.3.2
....................
11 seconds
4 seconds

....................
3.6.4.3
....................
1 minute and 40 seconds
12 seconds

...........................
3.6.7.2
...........................
1 minute and 35 seconds
13 seconds

...........................
4.1.5.2
...........................
5 minutes
36 seconds

...........................
5.2.4.2
...........................
46 seconds
18 seconds

...........................
5.4.0.0.alpha0 (Jan 2 2017)
...........................
44 seconds
14 seconds


as you may see, loading time was very fast in LibO 3.6.0 for both files (7 and 3 seconds each), than had a small performance drop in 3.6.1 to 3.6.3 (11 and 5 seconds) and a more consistent performance regression in 3.6.4 and 3.6.7 (1 minute and 40 seconds and 13 seconds).

things became even worse in 4.1.5 which showed incredibly slow loading times (5 minutes and 36 seconds) but improved in 5.2.4 and 5.4.0.0 Dev.

so things are better now but still suboptimal:

attachment 90994 [details] 
3.6.0.4 ->  7 seconds
5.4.0.0 -> 44 seconds

attachment 91339 [details] 
3.6.0.4 ->  4 seconds
5.4.0.0 -> 14 seconds
Comment 20 MM 2017-01-08 01:28:13 UTC
(In reply to tommy27 from comment #19)

> 
> things became even worse in 4.1.5 which showed incredibly slow loading times
> (5 minutes and 36 seconds) but improved in 5.2.4 and 5.4.0.0 Dev.
> 
> so things are better now but still suboptimal:
> 

Things are better than they were, but still not really good. Try saving the files again in .doc and reload them. Now they open fast again....
Comment 21 tommy27 2017-01-08 08:40:14 UTC
@MM
thanks, this may be a good hint for developers.
I did a retest with 5.2.4.2 and 5.4.0.0 alpha under Win8.1 x64.

this is true only for attachment 90994 [details] 
if you resave as .doc and reload it, the filesize changes from 1.06 MB to 566 KB and loading time is again 7 seconds as in LibO 3.6.0

however if you do the same thing for attachment 91339 [details] 
the filesize changes from 139KB to 100 KB 
but loading time is still 18 seconds (in 5.2.4) just like the original .doc
Comment 22 tommy27 2017-01-08 08:43:06 UTC
Created attachment 130251 [details]
attachment 90994 [details]_resaved

same as attachment 90994 [details] but saved again as .doc in LibO 5.2.4.2
resaved version loads faster than original one.
Comment 23 tommy27 2017-01-08 08:44:19 UTC
Created attachment 130252 [details]
attachment 91339 [details]_resaved

same as attachment 91339 [details] but resaved as .doc in LibO 5.2.4.2
it still loads slow as the original version
Comment 24 MM 2017-01-08 11:05:15 UTC
(In reply to tommy27 from comment #23)
> attachment 91339 [details]_resaved
> 
> same as attachment 91339 [details] but resaved as .doc in LibO 5.2.4.2
> it still loads slow as the original version

With attachment 91339 [details] resaved with Goo3.2.0 / 5.1.6 under windows 7 x64 and one of the latest 5.4 releases under ubuntu 16.04 x64, it opens in about 7-8 secs instead of 25 secs.
Comment 25 Xisco Faulí 2017-09-21 21:39:36 UTC
Both documents take 10 seconds to be opened.
Closing as RESOLVED WORKSFORME