Bug 93660 - FILEOPEN: Writer takes too long for DOCX with large table opening
Summary: FILEOPEN: Writer takes too long for DOCX with large table opening
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.1.0.0.alpha0+ Master
Hardware: Other All
: medium major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisectNotNeeded, filter:docx, perf, regression
Depends on:
Blocks: DOCX-Tables DOCX-Opening
  Show dependency treegraph
 
Reported: 2015-08-25 15:40 UTC by Timur
Modified: 2023-03-07 19:24 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
ProcExplorer stack during load (3.70 KB, text/plain)
2015-08-26 00:35 UTC, V Stuart Foote
Details
Screenshot from Very Sleepy (113.96 KB, image/png)
2016-08-04 04:00 UTC, Aron Budea
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timur 2015-08-25 15:40:03 UTC
Attachment 68798 [details] from Bug 56183 could be open starting from LO 4.4.0 to the current 5.0.1, although very slowly.
But, master 5.1+ can't open it, it just hangs on loading, or it's even more slower now.

Here is Linux log:
warn:ucb.ucp.gio:21748:1:ucb/source/ucp/gio/gio_content.cxx:400: ignoring GError "The specified location is not supported" for <vnd.sun.star.job:alias=UpdateCheck>
warn:sfx.dialog:21748:1:sfx2/source/dialog/filtergrouping.cxx:361: already have an element for WordPerfect
warn:sfx.dialog:21748:1:sfx2/source/dialog/filtergrouping.cxx:361: already have an element for writerweb8_writer_template
warn:sfx.dialog:21748:1:sfx2/source/dialog/filtergrouping.cxx:361: already have an element for writerglobal8
warn:writerfilter:21748:1:writerfilter/source/dmapper/FontTable.cxx:144: FontTable::lcl_sprm: unhandled token: 93128
warn:writerfilter:21748:1:writerfilter/source/dmapper/DomainMapper_Impl.cxx:558: no context of type 1 available
warn:writerfilter:21748:1:writerfilter/source/dmapper/DomainMapper_Impl.cxx:558: no context of type 1 available

Similar with attachment 93986 [details] from Bug 74916, looks like it was a duplicate. 
Proper fix that can open them in no(rmal) time would be nice.
Comment 1 V Stuart Foote 2015-08-26 00:35:07 UTC
Created attachment 118183 [details]
ProcExplorer stack during load

Loads in 5.1.0.0alpha0+ but very slowly. Attaching a MS ProcExplorer Stack trace of soffice.bin during the import filtering.
Comment 2 V Stuart Foote 2015-08-26 01:16:44 UTC
So, on Windows 10 Pro 64-bit en-US Intel 7 920, 12GB Ram with 

Version: 5.1.0.0.alpha1+ (x64)
Build ID: d0489d0827fc6cef04d0f3602023d82ceda82480
TinderBox: Win-x86_64@62-TDF, Branch:MASTER, Time: 2015-08-21_22:27:16
Locale: en-US (en_US)

The .docx did eventually open--in just 33 minutes :O

CPU was load never above 15%, memory pretty steady at about 660,000K.

No idea what it was doing with the filter import of the table and cell paragraphs for all that time...
Comment 3 raal 2015-08-26 13:03:15 UTC
(In reply to V Stuart Foote from comment #2)
> So, on Windows 10 Pro 64-bit en-US Intel 7 920, 12GB Ram with 

> 
> The .docx did eventually open--in just 33 minutes :O

MS word 2010 can open this file at ~20 sec on worse HW, setting as NEW.
Comment 4 V Stuart Foote 2015-08-26 13:17:16 UTC
setting regression as it is 10 times slower than at 4.4 when "Resolved" WFM for bug 56183
Comment 5 Robinson Tryon (qubit) 2015-12-09 18:45:05 UTC Comment hidden (obsolete)
Comment 6 Aron Budea 2016-08-04 04:00:59 UTC
Created attachment 126565 [details]
Screenshot from Very Sleepy

Let's see if Very Sleepy profiler is any useful, I ran it with a 120s time limit on an enable-symbols master build. I'm attaching a screenshot, most of it is useless, but the following function stands out:

SwIterator<SwCellFrame,SwFormat>::Next
http://opengrok.libreoffice.org/xref/core/sw/inc/calbck.hxx#306
(74s seems to be spent inside it, and 60s on dynamic casts, compared to the 120s limit that is significant)
Comment 7 Xisco Faulí 2016-09-12 12:49:08 UTC Comment hidden (obsolete)
Comment 8 Aron Budea 2016-10-30 06:29:42 UTC Comment hidden (obsolete)
Comment 9 Aron Budea 2016-10-30 09:39:04 UTC Comment hidden (no-value)
Comment 10 Oliver Specht (CIB) 2016-11-01 13:17:33 UTC
(In reply to Aron Budea from comment #8)
> Dynamic casts were added in this commit:
> https://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=fa91dd31f39a24329d288d4e1cda28db3a16af0d
> 
> "5th step to remove tools/rtti.hxx
> tools/rtti.hxx removed
> completed the interface of some Sdr.*  Items
> and removed pseudo items"

At the time the issue was reported (2015-08-25 15:40:03 UTC) the commit mentioned above was not even planned ;-)

The main problem in table import in writerfilter is the missing table API. 
Tables are imported as simple paragraphs and converted to a table after the last paragraph is finished. (see  SwXText::convertToTable() )

The 'normal' filters create the table by adding cells and rows.
Comment 11 Timur 2016-12-14 13:11:44 UTC
Following Oliver's comment, is this regression then? Does bibisectRequest have sense? How should we change the title? "missing table API to create the table by adding cells and rows"?
Is this Attachment 68798 [details] from Bug 56183 the same problem as with attachment 68798 [details] from Bug 56183 and attachment 93986 [details] from Bug 74916 and attachment 95848 [details] from Bug 76200?
Comment 12 Timur 2017-11-06 17:47:32 UTC Comment hidden (obsolete)
Comment 13 Buovjaga 2018-07-12 15:15:59 UTC
After thinking about this more (and no input from devs), setting as bibisectNotNeeded
Comment 14 Xisco Faulí 2019-05-30 16:13:02 UTC
it takes

real	8m48,341s
user	8m46,532s
sys	0m0,590s

in


Version: 6.3.0.0.alpha1+
Build ID: ad7dfdef5f9504dfcd600bf4d88a97c35b9d5d6d
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

@buovjaga, would you mind getting a perf graph ?
Comment 15 Timur 2022-05-19 13:49:31 UTC
After this commit file opens, previously it didn't open at all:
4.4 212,44s user 1,06s system 88% cpu 4:02,37 total
 b24f22b2d3666f13fea09b90fa78a0a69ed5cc64 is the first bad commit
Date:   Sat Mar 14 21:24:14 2015 +0800
    source-hash-6c7f0e8bfacac44493e44c4ea613d064c3fb5348
    pre source-hash-fb171e3886c0c28c61f1e1960f7b427644a501fe
author	Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
remove now redundant old implementation
Comment 16 Roman Kuznetsov 2023-03-07 19:23:05 UTC
The perf problem is stille here

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 0484a9a3f5e2ecb678f6fb41bbb251529e89c00d
CPU threads: 16; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL threaded