Bug 155170 - LibreOffice stops responding when opening the attached PDF
Summary: LibreOffice stops responding when opening the attached PDF
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf, perf
Depends on:
Blocks: PDF-Import-Draw PDF-Import-Writer Memory Performance CPU-AT-100%
  Show dependency treegraph
 
Reported: 2023-05-06 20:12 UTC by Anas
Modified: 2024-08-29 14:33 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
this is a test for my daughter in Turkish (7.93 MB, application/pdf)
2023-05-06 20:12 UTC, Anas
Details
Just page 2 of the full document (1.58 MB, application/pdf)
2024-06-26 01:31 UTC, Dave Gilbert
Details
Just the problematic element (3.41 MB, application/pdf)
2024-07-10 01:37 UTC, Dave Gilbert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anas 2023-05-06 20:12:05 UTC
Created attachment 187119 [details]
this is a test for my daughter in Turkish

Libre Office Writer stops responding when opening the attached pdf. The language of the pdf file is Turkish

My version of Libre Office Writer is 7.4.6.2 (x64)
My os is Windows 10

My Processor is 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, 1690 Mhz, 4 Core(s), 8 Logical Processor(s)
Comment 1 Robert Großkopf 2023-05-07 06:42:12 UTC
Tested it with the attached document. LO will hang to import the content. Document is created by Adobe InDesign 14.0. Don't know if there isn't any function, which isn't supported by pdf import.

Then I opened the document in a pdf-viewer and printed it to *.pdf. Document expands from 7,9 MB to 16,5 MB. This document could be opened by LO without any problem.

Don't know if this is a bug, because I don't know if there are features used by InDesign, which aren't supported by LO. But LO should respond with n"not supported". I will set this one to NEW.
Comment 2 Roman Kuznetsov 2023-05-09 18:38:11 UTC
It does not matter Writer or Draw try to open the PDF. LibreOffice hangs, takes 100% of CPU and starts to eat the memory
Comment 3 csyu.279 2023-05-25 14:48:22 UTC
Couldn't open in:

Version: 4.2.0.0.alpha1+
Build ID: fc8f44e82de4ebdd50ac5fbb9207cd1a59a927e3

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 45826e60d5f1508d54b0f0a4d98b0e2ebe94a097
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 4 Eyal Rozenberg 2023-09-14 22:58:44 UTC
Unbelievable - still hangs LibreOffice Draw, and LibreOffice Writer with:

Version: 7.6.0.3 (X86_64) / LibreOffice Community
Build ID: 69edd8b8ebc41d00b4de3915dc82f8f0fc3b6265
CPU threads: 4; OS: Linux 6.4; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US

and

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: cc7d6211bc01e5ec84dbad542605d2e93dea925c
CPU threads: 4; OS: Linux 6.4; UI render: default; VCL: gtk3
Locale: he-IL (en_IL); UI: en-US


Also, the hang happens both with the PDF import filter and the Writer import filter. And I also seem to get high CPU usage during the hang.
Comment 5 Dave Gilbert 2024-01-14 12:36:01 UTC
Comment on attachment 187119 [details]
this is a test for my daughter in Turkish

This one is different from 113050, it'll need separate debug.
I'm not seeing any tiling patterns in the intermediate file; however that intermediate file is pretty huge, so something is going on.
Comment 6 Dave Gilbert 2024-06-26 00:58:54 UTC
This does eventually load for me.
On Fedora 40's 24.2.4.2.2.fc40 it took under a minute but eat 3.8G resident.
On my debug build it took a lot longer to load and 2.8G.
There are artifacts on page 14,15, and 23 where something is wrong; but I can scroll through it once loaded.
Comment 7 Dave Gilbert 2024-06-26 01:31:48 UTC
Created attachment 194958 [details]
Just page 2 of the full document

I used pdfseparate to split the document into individual pages; while pages 6 and 22 are a little slow, all the rest except page 2 are fine.

Page 2 is 99% of the problem; visually it's unremarkable - although the 4 circles with i., ii.,iii.,iv. in are misrendered
Comment 8 Dave Gilbert 2024-06-27 01:18:18 UTC
1) If you have the 'navigator' up it takes a lot longer - I've not succesfully seen it load; I guess that's populating the tree in it.

2) libreoffice --convert-to fodg page2-uncomp.pdf
gets me a 73M fodg file which exhibits the same problem on loading.

So I can try and have a dig into that to see what is so big.
Comment 9 Dave Gilbert 2024-06-27 01:48:13 UTC
Looking at the fodg, there's ~46000 graphics and paragraph styles, all wired to svg paths; I think this is all the pretty graphic in question 7 at the upper right.
Comment 10 Dave Gilbert 2024-06-28 01:36:07 UTC
looking at the output of xpdfimport, there are 222903 fillPath's and 223411 updateFillColors (most grouped one fillPath/one update) and I'm pretty sure most are in the section 7 of the page.
(There are also some HUGE clippaths, I think ~20k points??? but I think those are later).
Now I need to follow back into xpdfimport to see where they're coming from.
Comment 11 Eyal Rozenberg 2024-06-28 06:50:19 UTC
(In reply to Dave Gilbert from comment #10)

So, I'm not a LO developer, but I would like to point out that these high numbers you mentioned should not get LO stuck. That is, even if it was 2 million, or 20 million or 200 million filling and update tasks to perform - LO should have become responsive quickly after the file is opened, with at least the first page usable, with inter-page navigation possible, and with the second page showing some intermediary indication of it loading.

... but perhaps I should open a different bug about that?
Comment 12 Dave Gilbert 2024-06-28 10:53:25 UTC
Yes I agree; although I suspect there's actually many problems around that.
And if I find any specific issues I'll file bugs.

However, I'm concentrating on the PDF import part; it sometimes creates lots and lots of items for silly reasons; so my first step is to understand what's going on.
Comment 13 Dave Gilbert 2024-06-30 14:59:58 UTC
I think I understand what's going on; and I don't think it's the imports fault (but maybe there's a chance to optimise something).

Page 2, question 7 has a wood effect clip art; it's made up of about 5 layers, and the back one (which is mostly obscured!) is made up of ~175k draw:path's.
They are mostly very small (sub mm often) and often have different colours.
The colours are derived via a crazily complicated shading mesh in the pdf.

(I've not quite figured out why it's 178k draw:paths - I see 3600 'm' operators in the pdf, 44000 'c' and 46282 'l' but sinceit's derived from that shading it might make sense it's split.
Comment 14 Dave Gilbert 2024-06-30 23:58:10 UTC
Actually...

Our output device doesn't define useShadedFills, so it's falling back to poppler's internal shaded fill code; so there's the potential for us to define that and do something more efficient which might be more efficient.
(I think this is a type '7' fill in the case of this particular example, which is the weird tensor type)
Comment 15 Dave Gilbert 2024-07-10 01:37:02 UTC
Created attachment 195196 [details]
Just the problematic element

This is just the problematic set of elements in the page 2; I've stripped it right down using qpdf.
Still takes ~3GB of ram trying to load it into LO
Comment 16 Dave Gilbert 2024-07-15 01:45:36 UTC
THe fun fills come from decomposed 'type 7' - ie. tensor type - mesh shadings that poppler has decomposed.
Even by those standards they are complex in this PDF; there are eight separate shadings
the worst with 965 separate patches.

These type 7 shadings are 'tensor-product' shadings; not actually that much more complex than the type 6 shadings which are coons patches - i.e. each patch is defined by a bezier for each edge; the type 7's have an extra 4 control points.

Looking at other formats;
SVG doesn't have either of these yet, but they've got a coons patch one in the works;
https://svgwg.org/svg-next/pservers.html#MeshGradients

Cairo apparently supports both 6 and 7.
Comment 17 Dave Gilbert 2024-08-24 00:32:17 UTC
I'm just keeping track of other things going on in this document as I work through it, so I think:

Very slow loading [Due to type 7 fill]

p.1 sec 1; big black surrounds to the 1..4 roundels
p.2 sec 5; again big black surrounds
p, 3 sec 10 - possible black surround
p.6 sec 2 odd clipping on image?
p.14 big grey block top left
p.15 huge circle
p.23 sec 6 huge circles

I think I've got page 14 rendering fixed and some of page 6,14,15, and 23 in a clipping fix world I have; I'll get that posted soon.
Comment 18 Dave Gilbert 2024-08-29 14:33:17 UTC
With the https://git.libreoffice.org/core/commit/b416c5b8e32632a63e1e791c34896e17d89f7982
I've just got in,

p.6 sec2's clipping is better; but the image still isn't right
p.14's big block has gone
p.15's circles are gone - but the fish are still odd.
p.23's big circles have gone - but still has odd image

Which leaves us with:

Very slow loading [Due to type 7 fill]

p.1 sec 1; big black surrounds to the 1..4 roundels
p.2 sec 5; again big black surrounds
p, 3 sec 10 - possible black surround
p.6 sec 2 - image oddities
p.15 odd fish
p.23 Images not quite right

which i think means there are actually 3 bugs-ish:
 a) The slow loading of type 7 fills
 b) The big black surrounds
 c) Some image clipping problem on a few pages.