Bug 76260 - FILEOPEN: slow loading and dump of .docx with lots of footnotes (see Comment 27 and Comment 43)
Summary: FILEOPEN: slow loading and dump of .docx with lots of footnotes (see Comment ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high critical
Assignee: Not Assigned
URL:
Whiteboard: target:4.4.0 target:4.3.0.2 target:6.4.0
Keywords: filter:ooxml, perf
: 39179 79732 (view as bug list)
Depends on:
Blocks: DOCX-Opening DOCX-Footnote-Endnote File-Opening
  Show dependency treegraph
 
Reported: 2014-03-17 11:30 UTC by Tushar Bende
Modified: 2020-06-20 20:01 UTC (History)
15 users (show)

See Also:
Crash report or crash signature:


Attachments
This document is taking long time to open in LO (311.62 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2014-03-17 11:30 UTC, Tushar Bende
Details
This document is taking long time to open in LO (311.62 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2014-03-17 11:37 UTC, Tushar Bende
Details
backtrace (11.97 KB, text/plain)
2015-07-25 20:21 UTC, Gordo
Details
docx saved in mso 2016 with more actual docx version (180.26 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-03-16 01:13 UTC, paulystefan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tushar Bende 2014-03-17 11:30:54 UTC
Created attachment 95926 [details]
This document is taking long time to open in LO

Problem description: 
the document is attached

Steps to reproduce:
1. open the attached document and look at opening time
2. opening time Ten minutes

Current behavior:
opening time ten minutes

Expected behavior:
has to open within several seconds
Comment 1 Tushar Bende 2014-03-17 11:37:07 UTC
Created attachment 95927 [details]
This document is taking long time to open in LO
Comment 2 tommy27 2014-03-17 11:59:39 UTC
tested with LibO 4.1.5.3 under Win7x64 Pro
Intel i7 CPU 950 @ 3.07 GHz 8GB RAM

the attached .docx file is a 105 pages document and it takes 5 minutes to load.

status --> NEW. Version --> 4.1.5.3 
Reworded summary notes.

please specify your PC operative system, CPU and RAM...
the PC power could be the reason why you see 10 minutes delay and I see only 5 (which is anyway too much).
Comment 3 Tushar Bende 2014-03-17 13:22:20 UTC
@tommy27:
on LibreOffice Version: 4.2.2.1 Build ID: 3be8cda0bddd8e430d8cda1ebfd581265cca5a0f taking 1min:30 sec to open.
Other Details:	
OS: Ubuntu 12.04 LTS 
RAM:16GB
Processor Intel® Core™ i5-3470 CPU @ 3.20GHz × 4  (64 bit)
--------------------------------------------------------------------------------
On LibreOffice Version: 4.3.0.0.alpha0+ Build ID: cdceca118b65f69d8c16bf3f8465f940aed73c10 it is taking 12 Min to open this doc.
OS: Ubuntu 12.04 LTS 
RAM:16GB
Processor Intel® Core™ i5-3470 CPU @ 3.20GHz × 4  (64 bit)


From this it looks like Regression due to some code changes.
Comment 4 tommy27 2014-03-17 13:31:18 UTC Comment hidden (obsolete)
Comment 5 Tushar Bende 2014-03-17 13:39:52 UTC Comment hidden (obsolete)
Comment 6 tommy27 2014-03-17 14:13:30 UTC Comment hidden (obsolete)
Comment 7 Tushar Bende 2014-03-17 14:34:26 UTC Comment hidden (obsolete)
Comment 8 tommy27 2014-03-17 19:22:35 UTC
tested on another (less powerful) Win7x64 machine

4.1.5 and 4.2.1 take almost 7 minutes to load the file...

4.3.0alpha did not load it even after 21 minutes when I got tired and killed the process.

so it's a performance regression of the 4.3.x branch over an already bad performance of 4.1.x and 4.2.x branch

I add Writer expert to CC list.
Comment 9 Yousuf Philips (jay) (retired) 2014-06-07 00:05:45 UTC
*** Bug 79732 has been marked as a duplicate of this bug. ***
Comment 10 Yousuf Philips (jay) (retired) 2014-06-07 01:46:54 UTC
On Windows 7 64-bit with a Intel Core 2 CPU @ 1.83Ghz and 2.5gb it took:

19 minutes on 4.4a (2014-06-03)
8 minutes on 4.3 beta 1 (2014-06-03)
7 minutes on 4.2.4
7 minutes on 4.1.6
7 minutes on 4.0.6

Just as a comparison:
4.5 minutes on Kingsoft Office
Comment 11 tommy27 2014-06-07 03:30:01 UTC
@Jay
thanks for extensive comparative tests
very bad performance regression between 4.3.x and 4.4.x
Comment 12 Michael Meeks 2014-06-07 05:14:21 UTC
Our automated regression tests show a big slow-down in some writer document loads recently; cf. http://dev-builds.libreoffice.org/callgrind_report/history.fods - am investigating that currently ...

comparative callgrind traces before/after much appreciated =)
Comment 13 Yousuf Philips (jay) (retired) 2014-06-07 11:51:17 UTC Comment hidden (obsolete)
Comment 14 Michael Meeks 2014-06-07 13:18:49 UTC Comment hidden (obsolete)
Comment 15 Joel Madero 2014-06-09 01:36:27 UTC Comment hidden (obsolete)
Comment 16 Joel Madero 2014-06-18 03:52:11 UTC
Ubuntu 14.04 x64, Dell Studio 1737  Intel Core 2 Duo T6500 / 2.1 GHz , 4 gigs of RAM - I can't reproduce this massive increase. 

A few examples below:

4.4 Master Build Date: Fri Jun 6 21:27:52 2014 +0100
3:55

4.3.0.0.beta1
4:17

4.2.5.2
3:59


A bit of a spike for beta1 but not the 8 minutes.
Comment 17 Michael Meeks 2014-06-18 11:48:39 UTC
So - bug#38513 is in fact un-related and/or a marginal win.

The real deal here is an incredibly pretty N^3 algorithm in the number of StyleFamilies in around 1500 elements

SwXStyleFamily::getElementNames -> 1946 calls
SwDoc::FindPageDescByName       -> 1.9 million calls
rtl::OUString::equals           -> 2.5 bn calls

Shouldn't be so hard to nail - I have a trace here =)
Comment 18 Commit Notification 2014-06-23 14:42:15 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=78378af1d404baf78f42930a29dbf8eae22bbe80

fdo#76260 - the wrong way to get a 10% win with an N^3 operation.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2014-06-23 14:42:30 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9e5e9dd1b276043d2e9f45c01d72b2e89d8abdf2

fdo#76260 - a better approach for getting element names.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2014-06-23 14:42:46 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=295b97b2a654e00ac5a8e6a3545284fa583fce78

fdo#76260 - Switch from vector to std::stack.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2014-06-24 07:48:04 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=51fb76a0beacee9d8a43abca493af1b8d2652b53&h=libreoffice-4-3

fdo#76260 - the wrong way to get a 10% win with an N^3 operation.


It will be available in LibreOffice 4.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2014-06-24 07:48:18 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=22c818bd9fa79a2c008719cc0a858ba2a74b0d82&h=libreoffice-4-3

fdo#76260 - a better approach for getting element names.


It will be available in LibreOffice 4.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2014-06-24 07:48:33 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7d32ac663e7ac4c6f3f22d003c3f36437be43399&h=libreoffice-4-3

fdo#76260 - Switch from vector to std::stack.


It will be available in LibreOffice 4.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 24 Commit Notification 2014-06-24 08:32:08 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=95e6cc2ecbcb653f76c4a1ee109908a12b84e456

Related: fdo#76260 writerfilter: move SavedAlternateStates to OOXMLParserState



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 25 Commit Notification 2014-07-02 09:52:56 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=64b1566e55677217c9c0dd13e5fbf8faf40810f9

fdo#76260 - switch O(N^2) lookup in SwStyleSheetIterator to O(N)



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 26 Commit Notification 2014-07-02 10:59:40 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5d157ce0d77b7deb6f510eee01c6e211c9713ff3

fdo#76260 - don't allocate and free std::strings on each element.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 27 Michael Meeks 2014-07-02 12:04:40 UTC
Hah - so, all this useful micro-optimisation later; it turns out that the problem is quite simple; here am I trying to make parsing more efficient (and there is a load of dead-wood there) - but I get only 10%+ at a time ;-)

It turns out that 367bn cycles of 376bn (ie. all of it) are in children of:

OOXMLFastContextHandler::resolveFootnote

which is called 973 times here (on a 700kb XML file). I rather suspect that we are parsing that same file in its entirety repeatedly to no good purpose ;-) surely we have to be doing something almost as silly to be -this- slow for what is (after all) not the world's largest DOCX file.

Then again, callgrind could be lying to me - but ... the stack seems to suggest that we create a sub-stream and launch a new fastparser for it each time we hit a footnote here.

#1  0xac603a41 in writerfilter::ooxml::OOXMLDocumentImpl::resolve (this=0x88678d8, rStream=...)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:502
#2  0xac59b428 in writerfilter::dmapper::DomainMapper_Impl::substream (this=0x87a9128, rName=90016, ref=
  boost::shared_ptr {_vptr.Reference = 0xac72f3e0 <vtable for writerfilter::ooxml::OOXMLDocumentImpl+8>})
    at /data/opt/libreoffice/master/writerfilter/source/dmapper/DomainMapper.cxx:3003
#3  0xac638336 in writerfilter::LoggedStream::substream (this=0x87a9ae4, name=90016, ref=
  boost::shared_ptr {_vptr.Reference = 0xac72f3e0 <vtable for writerfilter::ooxml::OOXMLDocumentImpl+8>})
    at /data/opt/libreoffice/master/writerfilter/source/resourcemodel/LoggedResources.cxx:253
#4  0xac6004d3 in writerfilter::ooxml::OOXMLDocumentImpl::resolveFastSubStreamWithId (this=0x87b1cd0, rStream=..., pStream=
  boost::shared_ptr {_vptr.Reference = 0xac72f3e0 <vtable for writerfilter::ooxml::OOXMLDocumentImpl+8>}, nId=90016)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:123
#5  0xac60244c in writerfilter::ooxml::OOXMLDocumentImpl::resolveFootnote (this=0x87b1cd0, rStream=..., rType=@0xbfffd2fc: 0, nNoteId=13)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:311
#6  0xac608e7b in writerfilter::ooxml::OOXMLFastContextHandler::resolveFootnote (this=0xaaf93328, nId=13)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLFastContextHandler.cxx:895
#7  0xac5fecdb in writerfilter::ooxml::OOXMLFootnoteHandler::attribute (this=0xbfffd3d8, name=92906, val=...)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/Handler.cxx:44
#8  0xac633b8c in writerfilter::ooxml::OOXMLPropertyImpl::resolve (this=0x8860998, rProperties=...)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLPropertySetImpl.cxx:168
#9  0xac633196 in writerfilter::ooxml::OOXMLPropertySetImpl::resolve (this=0x8840de8, rHandler=...)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLPropertySetImpl.cxx:425
#10 0xac60e023 in writerfilter::ooxml::OOXMLFastContextHandlerProperties::handleXNotes (this=0xaaf93328)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLFastContextHandler.cxx:1141
#11 0xac6c5087 in writerfilter::ooxml::OOXMLFactory_wml::endAction (this=0x87d76a0, pHandler=0xaaf93328)
    at /data/opt/libreoffice/master/workdir/CustomTarget/writerfilter/source/ooxml/OOXMLFactory_wml.cxx:4840
#12 0xac60551f in writerfilter::ooxml::OOXMLFactory::endAction (this=0x87d26a8, pHandler=0xaaf93328)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLFactory.cxx:262
#13 0xac6084c2 in writerfilter::ooxml::OOXMLFastContextHandler::endAction (this=0xaaf93328, Element=2165030)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLFastContextHandler.cxx:376
#14 0xac60af5c in writerfilter::ooxml::OOXMLFastContextHandlerProperties::lcl_endFastElement (this=0xaaf93328, Element=2165030)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLFastContextHandler.cxx:1070
#15 0xac607fce in writerfilter::ooxml::OOXMLFastContextHandler::endFastElement (this=0xaaf93328, Element=2165030)
    at /data/opt/libreoffice/master/writerfilter/source/ooxml/OOXMLFastContextHandler.cxx:249
#16 0xad89e640 in (anonymous namespace)::Entity::endElement (this=0x87b3ff8)
    at /data/opt/libreoffice/master/sax/source/fastparser/fastparser.cxx:487
#17 0xad8a9206 in doContent (parser=parser@entry=0x88347d0, startTagLevel=startTagLevel@entry=0, enc=0xad8c6d9c <utf8_encoding>,
Comment 28 Joel Madero 2014-07-08 16:20:41 UTC
Removed bibisectrequest as Michael has already pushed a commit and we don't even have confirmation that it's a regression. One person says that it was broken in 4.2, someone else saying it was already broken in 4.1. If a bibisect is still needed please confirm what version works well and add bibisectrequest to Whiteboard. Thanks!
Comment 29 Yousuf Philips (jay) (retired) 2014-07-08 19:47:26 UTC
(In reply to comment #11)
> @Jay
> thanks for extensive comparative tests
> very bad performance regression between 4.3.x and 4.4.x

@tommy27: please ignore the 4.4.x test result as i was running the Win-x86@39 which i just found out today is the slower version. there are 4 versions of Win-x86, which one should i be using?

(In reply to comment #28)
> If a
> bibisect is still needed please confirm what version works well and add
> bibisectrequest to Whiteboard. Thanks!

@joel: This was my bad as i added the bibisect and regression keywords. :)
Comment 30 Michael Meeks 2014-07-14 11:59:42 UTC
May be related to bug#81214 - its just possible =)
Comment 31 Björn Michaelsen 2014-08-21 12:21:13 UTC Comment hidden (obsolete)
Comment 32 Gordo 2015-07-25 20:21:34 UTC
Created attachment 117436 [details]
backtrace

About 2 minutes to open on
Windows Vista 64
Version: 4.4.4.3
Build ID: 2c39ebcf046445232b798108aa8a7e7d89552ea8

After 15 minutes got this backtrace on
Windows Vista 64
Version: 5.1.0.0.alpha1+
Build ID: 8cfdd81b70ef37927b40497ffd10034f28335034
TinderBox: Win-x86@39, Branch:master, Time: 2015-07-24_02:47:18

There are instances where the footnotes do not fit on the same page and are continued onto the next page.  Some pages have a big gap between the text and the footnotes (that's already been reported elsewhere).  The footnote style is Times New Roman but every footnote has been direct formatted to Arial.

There is also a formula that is referenced in the Navigator three times with blank entries.

There are 1946 converted page styles showing in Styles and Formatting.
Comment 33 tommy27 2015-07-26 02:01:23 UTC
(In reply to tommy27 from comment #8)
> tested on another (less powerful) Win7x64 machine
> 
> 4.1.5 and 4.2.1 take almost 7 minutes to load the file...
> 
> 4.3.0alpha did not load it even after 21 minutes when I got tired and killed
> the process.
> 
> ...

tested under Win8.1 x64 with an AMD A8 processor, 8 GB RAM and SSD disk.

LibO 4.4.5.1 and 5.0.0.4 RC x64 take 4 minutes and 20 seconds to load the file

interestingly the loading progress bar is quite steady till 95% after 2 minutes... then the last 2 minutes and 20 seconds are spent just to load the remaining 5%

has anyone the chance to record the time that MS Word needs to load this file?
Comment 34 Justin L 2015-08-24 06:16:39 UTC
comment 13 indicates 3 seconds on Word 2013

Word 2003 (with docx extensions added) took approximately 9 seconds to load.  Windows 2003 server, 3GB ram, Core2 Duo, 7 year old computer.
LibreOffice 4.3.7 took approximately 4 minutes to open on the same machine.
Comment 35 ikonta 2015-08-26 10:30:48 UTC
(In reply to tommy27 from comment #33)
(In reply to Justin L from comment #34)

Probably there is some specific to windows port.

For me (LO 4.4.4.3, external built binary package on amd64 GNU/Linux system, with standard HDD and 2 GB RAM) it takes abou 65 seconds to open this file.
The only strange thing is that process bar goes almost to end in about 30 seconds.

Word is very different application.
It's designed to _look_ quick.
But if user will look inside, he can see, that it is not right.
I've seen how word processes large (thousands pages) document.

The only wrong is that LO Writer half time of opening this document don't show it's doing something.
Comment 36 Robinson Tryon (qubit) 2015-12-14 05:57:19 UTC Comment hidden (obsolete)
Comment 37 Telesto 2017-05-28 09:53:17 UTC
Still quite slow (around 2 min to open)
Version: 5.5.0.0.alpha0+
Build ID: d57e6cd9dcc96112994ca2b14ac45896e86b26e5
CPU threads: 4; OS: Windows 6.19; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-05-18_22:43:07
Locale: nl-NL (nl_NL); Calc: CL

also found in
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 38 rcdorner 2018-04-21 18:56:32 UTC Comment hidden (no-value)
Comment 39 Timur 2018-05-21 15:12:28 UTC
I don't see a large change between 6.0 and 6.1+. In my case, bot take around 1:20. So I don't confirm dcdorner.
I see dump: writerfilterlo!com_sun_star_comp_Writer_WriterFilter_get_implementation+1cc15
Comment 40 Xisco Faulí 2018-05-30 08:50:57 UTC
(In reply to rcdorner from comment #38)
> Just upgraded (04/21/18 11:32am) to 6.0.3.2 64-bit and now nothing works
> anymore!
> Before it took Up to 20 seconds to open a Single page DOCX File.(!!!)
> Now nothing happens anymore and it just crashes.
> Useless!

Does it crash for you?
I've tried with 

Versió: 6.0.4.2
ID de la construcció: 1:6.0.4~rc2-0ubuntu0.16.04.1
Fils de CPU: 4; SO: Linux 4.13; Renderitzador de la IU: per defecte; VCL: gtk3; 
Configuració local: ca-ES (ca_ES.UTF-8); Calc: group

and it took ~1 minute approximately.
Could you please reset your Libreoffice profile ( https://wiki.documentfoundation.org/UserProfile ) and
re-test?
If it's still happening, please create new report!
Thanks!
Comment 41 Xisco Faulí 2019-05-30 15:43:01 UTC
it takes

real	1m20,769s
user	1m19,477s
sys	0m0,615s

in

Version: 6.3.0.0.alpha1+
Build ID: ad7dfdef5f9504dfcd600bf4d88a97c35b9d5d6d
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

@Noel, I thought you might be interested in this issue...
Comment 42 Noel Grandin 2019-08-19 14:21:15 UTC
Unfortunately, there are, in fact ~1000 footnotes in this document, and each footnote lives in its own little substream, so we need to open and parse 1000 substreams.
Comment 43 Noel Grandin 2019-08-20 07:59:51 UTC
Hmmm, so we appear to be parsing the same substream over and over, once for each footnote.

We construct a child document, parse the substream, and filter out the piece we are interested in to build the child document.

In theory, we could delay dealing with notes till the end of the document, and then parse the substream once, but that's a pretty big change.
Comment 44 Commit Notification 2019-08-20 10:04:05 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/6fea13e7a10272922ffdf74b65add10ecf8cec38%5E%21

tdf#76260 cache next page style number

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 45 Michael Meeks 2019-08-20 13:40:08 UTC
> Hmmm, so we appear to be parsing the same substream over and over, once
> for each footnote.

Sure - there is just one XML file there; it's a nonsense doing the same rather expensive XML parsing repeatedly - we should parse once and cache the results somewhere sensible if we can.
Comment 46 Noel Grandin 2019-08-21 08:01:02 UTC
comments from vmiklos:

<vmiklos> noelgrandin: without reading the bug; IIRC the footnote parsing is lame, and it parses the whole stream all the time, but ignores all but one entries; so you have perf problems when you have lots of footnotes, correct?
<noelgrandin> vmiklos, correct.
<vmiklos> noelgrandin: one high-level idea might be to see what the RTF tokenizer does, i think it already generates a token stream which doesn't have this problem and handled by dmapper. if this is indeed the case, then perhaps the writerfilter/source/ooxml/ code could be improved to do the same, that might solve the problem.
<vmiklos> noelgrandin: or a different way: see if comments have the same problem; if not, perhaps the solution used there could be applied to footnotes, too.
<noelgrandin> vmiklos, thanks
Comment 47 Noel Grandin 2019-08-21 08:09:36 UTC
<vmiklos> noelgrandin: perhaps you can exploit the fact that document.xml refers to the references in footnotes.xml in the same order (refs in document.xml, data in footnotes.xml), so if you just make sure we stop reading after the reference we were looking for is found, and try to not start from 0 but try to continue, you would solve the perf problem for the majority of the cases
<noelgrandin> vmiklos, that is an interesting idea
Comment 48 Justin L 2019-11-30 07:11:00 UTC
*** Bug 39179 has been marked as a duplicate of this bug. ***
Comment 49 paulystefan 2020-03-16 01:13:47 UTC
Created attachment 158704 [details]
docx saved in mso 2016 with more actual docx version

docx is smaller in mso 2016 

problem in loading this file in LO 6.4.1.2 x64 win 10-64 is reduced, 
but real slow in my opinion.
Comment 50 paulystefan 2020-06-20 20:00:50 UTC
in 7.0.0b2 x64 in win 10 x64

about 4 minutes in reading for docx 312kb-file
about 2 minutes in reading for actual mso2016docx 200kb-file
Comment 51 paulystefan 2020-06-20 20:01:09 UTC Comment hidden (obsolete)