Created attachment 187119 [details] this is a test for my daughter in Turkish Libre Office Writer stops responding when opening the attached pdf. The language of the pdf file is Turkish My version of Libre Office Writer is 7.4.6.2 (x64) My os is Windows 10 My Processor is 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, 1690 Mhz, 4 Core(s), 8 Logical Processor(s)
Tested it with the attached document. LO will hang to import the content. Document is created by Adobe InDesign 14.0. Don't know if there isn't any function, which isn't supported by pdf import. Then I opened the document in a pdf-viewer and printed it to *.pdf. Document expands from 7,9 MB to 16,5 MB. This document could be opened by LO without any problem. Don't know if this is a bug, because I don't know if there are features used by InDesign, which aren't supported by LO. But LO should respond with n"not supported". I will set this one to NEW.
It does not matter Writer or Draw try to open the PDF. LibreOffice hangs, takes 100% of CPU and starts to eat the memory
Couldn't open in: Version: 4.2.0.0.alpha1+ Build ID: fc8f44e82de4ebdd50ac5fbb9207cd1a59a927e3 Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: 45826e60d5f1508d54b0f0a4d98b0e2ebe94a097 CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win Locale: en-US (en_US); UI: en-US Calc: threaded
Unbelievable - still hangs LibreOffice Draw, and LibreOffice Writer with: Version: 7.6.0.3 (X86_64) / LibreOffice Community Build ID: 69edd8b8ebc41d00b4de3915dc82f8f0fc3b6265 CPU threads: 4; OS: Linux 6.4; UI render: default; VCL: gtk3 Locale: en-IL (en_IL); UI: en-US and Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: cc7d6211bc01e5ec84dbad542605d2e93dea925c CPU threads: 4; OS: Linux 6.4; UI render: default; VCL: gtk3 Locale: he-IL (en_IL); UI: en-US Also, the hang happens both with the PDF import filter and the Writer import filter. And I also seem to get high CPU usage during the hang.
Comment on attachment 187119 [details] this is a test for my daughter in Turkish This one is different from 113050, it'll need separate debug. I'm not seeing any tiling patterns in the intermediate file; however that intermediate file is pretty huge, so something is going on.
This does eventually load for me. On Fedora 40's 24.2.4.2.2.fc40 it took under a minute but eat 3.8G resident. On my debug build it took a lot longer to load and 2.8G. There are artifacts on page 14,15, and 23 where something is wrong; but I can scroll through it once loaded.
Created attachment 194958 [details] Just page 2 of the full document I used pdfseparate to split the document into individual pages; while pages 6 and 22 are a little slow, all the rest except page 2 are fine. Page 2 is 99% of the problem; visually it's unremarkable - although the 4 circles with i., ii.,iii.,iv. in are misrendered
1) If you have the 'navigator' up it takes a lot longer - I've not succesfully seen it load; I guess that's populating the tree in it. 2) libreoffice --convert-to fodg page2-uncomp.pdf gets me a 73M fodg file which exhibits the same problem on loading. So I can try and have a dig into that to see what is so big.
Looking at the fodg, there's ~46000 graphics and paragraph styles, all wired to svg paths; I think this is all the pretty graphic in question 7 at the upper right.
looking at the output of xpdfimport, there are 222903 fillPath's and 223411 updateFillColors (most grouped one fillPath/one update) and I'm pretty sure most are in the section 7 of the page. (There are also some HUGE clippaths, I think ~20k points??? but I think those are later). Now I need to follow back into xpdfimport to see where they're coming from.
(In reply to Dave Gilbert from comment #10) So, I'm not a LO developer, but I would like to point out that these high numbers you mentioned should not get LO stuck. That is, even if it was 2 million, or 20 million or 200 million filling and update tasks to perform - LO should have become responsive quickly after the file is opened, with at least the first page usable, with inter-page navigation possible, and with the second page showing some intermediary indication of it loading. ... but perhaps I should open a different bug about that?
Yes I agree; although I suspect there's actually many problems around that. And if I find any specific issues I'll file bugs. However, I'm concentrating on the PDF import part; it sometimes creates lots and lots of items for silly reasons; so my first step is to understand what's going on.
I think I understand what's going on; and I don't think it's the imports fault (but maybe there's a chance to optimise something). Page 2, question 7 has a wood effect clip art; it's made up of about 5 layers, and the back one (which is mostly obscured!) is made up of ~175k draw:path's. They are mostly very small (sub mm often) and often have different colours. The colours are derived via a crazily complicated shading mesh in the pdf. (I've not quite figured out why it's 178k draw:paths - I see 3600 'm' operators in the pdf, 44000 'c' and 46282 'l' but sinceit's derived from that shading it might make sense it's split.
Actually... Our output device doesn't define useShadedFills, so it's falling back to poppler's internal shaded fill code; so there's the potential for us to define that and do something more efficient which might be more efficient. (I think this is a type '7' fill in the case of this particular example, which is the weird tensor type)
Created attachment 195196 [details] Just the problematic element This is just the problematic set of elements in the page 2; I've stripped it right down using qpdf. Still takes ~3GB of ram trying to load it into LO
THe fun fills come from decomposed 'type 7' - ie. tensor type - mesh shadings that poppler has decomposed. Even by those standards they are complex in this PDF; there are eight separate shadings the worst with 965 separate patches. These type 7 shadings are 'tensor-product' shadings; not actually that much more complex than the type 6 shadings which are coons patches - i.e. each patch is defined by a bezier for each edge; the type 7's have an extra 4 control points. Looking at other formats; SVG doesn't have either of these yet, but they've got a coons patch one in the works; https://svgwg.org/svg-next/pservers.html#MeshGradients Cairo apparently supports both 6 and 7.
I'm just keeping track of other things going on in this document as I work through it, so I think: Very slow loading [Due to type 7 fill] p.1 sec 1; big black surrounds to the 1..4 roundels p.2 sec 5; again big black surrounds p, 3 sec 10 - possible black surround p.6 sec 2 odd clipping on image? p.14 big grey block top left p.15 huge circle p.23 sec 6 huge circles I think I've got page 14 rendering fixed and some of page 6,14,15, and 23 in a clipping fix world I have; I'll get that posted soon.
With the https://git.libreoffice.org/core/commit/b416c5b8e32632a63e1e791c34896e17d89f7982 I've just got in, p.6 sec2's clipping is better; but the image still isn't right p.14's big block has gone p.15's circles are gone - but the fish are still odd. p.23's big circles have gone - but still has odd image Which leaves us with: Very slow loading [Due to type 7 fill] p.1 sec 1; big black surrounds to the 1..4 roundels p.2 sec 5; again big black surrounds p, 3 sec 10 - possible black surround p.6 sec 2 - image oddities p.15 odd fish p.23 Images not quite right which i think means there are actually 3 bugs-ish: a) The slow loading of type 7 fills b) The big black surrounds c) Some image clipping problem on a few pages.