88914 – PDF Import deadlock for an advertising presentation PDF with complex fill patterns

Bug 88914 - PDF Import deadlock for an advertising presentation PDF with complex fill patterns

Summary: PDF Import deadlock for an advertising presentation PDF with complex fill pat...

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	filters and storage (show other bugs)
Version: (earliest affected)	4.2.7.2 release
Hardware:	Other All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	filter:pdf, perf

Depends on:
Blocks:	PDF-Import-Draw
	Show dependency tree / graph

Reported:	2015-01-29 21:12 UTC by Philip
Modified:	2024-07-18 23:04 UTC (History)
CC List:	5 users (show)

See Also:
Crash report or crash signature:

Attachments
sample_document (1.37 MB, application/pdf) 2015-01-29 21:12 UTC, Philip	Details
MS Stacktrace of mini-dump prior to abort (8.62 KB, text/plain) 2015-01-30 05:45 UTC, V Stuart Foote	Details
pg14, pg22 extracted from problem PDF and then opened AND inserted to Draw ODG (1.69 MB, application/x-zip-compressed) 2024-07-02 10:51 UTC, V Stuart Foote	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philip 2015-01-29 21:12:11 UTC

Created attachment 112930 [details]
sample_document

Hi,

Calling 

libreoffice --headless --convert-to odt msa_bug.pdf

results in a deadlock. The cpu usage jumps to 100% and the software is not responsive anymore.

Best Regards
Philip

Comment 1 V Stuart Foote 2015-01-30 05:45:32 UTC

Created attachment 112938 [details]
MS Stacktrace of mini-dump prior to abort

Rather than headless, attempt to import the PDF into Draw

Windows 7 sp1, 64-bit en-US
Version: 4.4.0.3
Build ID: de093506bcdc5fafd9023ee680b8c60e3e0645d7
Locale: en_US

i7 920 CPU holds at ~13% consuming 186,956K RAM, ~430 file handles ~14 threads, ~87 user objects and ~161 GDI Objects. I/O read 104,518,857 bytes, I/O write grwos to ~61,570,000 bytes in 45 minutes.  Captured a mini-dump and aborted.

svtlo!GraphicManager::ImplCheckSizeOfSwappedInGraphics+b0 [c:\cygwin64\home\buildslave\source\libo-core\svtools\source\graphic\grfmgr2.cxx @ 223]

Attaching the Stacktrace.

Comment 2 V Stuart Foote 2015-01-30 05:51:11 UTC Comment hidden (obsolete)

@Philip, please note the release of LibreOffice you were working with if earlier than current 4.4.0.3 release.

Comment 3 Philip 2015-01-30 09:12:20 UTC

Hi Stuart,

I've seen the issue on the following versions:

LibreOffice 4.3.5.2 430m0(Build:2)
LibreOffice 4.2.7.2 420m0(Build:2)

Best Regards
Philip

Comment 4 vvort 2015-02-01 06:08:34 UTC

There are too many small images on page #14.
Not investigated it in detail yet.

Comment 5 QA Administrators 2016-02-21 08:37:44 UTC Comment hidden (obsolete)

** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present on a currently supported version of LibreOffice
(5.0.5 or 5.1.0) https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the version of LibreOffice and
your operating system, and any changes you see in the bug behavior

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave
a short comment that includes your version of LibreOffice and Operating System

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3)

http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword

Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa

Thank you for your help!

-- The LibreOffice QA Team This NEW Message was generated on: 2016-02-21

Comment 6 QA Administrators 2017-03-06 15:59:46 UTC Comment hidden (obsolete)

** Please read this message in its entirety before responding **

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

If you have time, please do the following:

Test to see if the bug is still present on a currently supported version of LibreOffice
(5.2.5 or 5.3.0 https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the version of LibreOffice and
your operating system, and any changes you see in the bug behavior

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave
a short comment that includes your version of LibreOffice and Operating System

Please DO NOT

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3)

http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to "inherited from OOo";
4b. If the bug was not present in 3.3 - add "regression" to keyword

Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug-20170306

Comment 7 Timur 2020-05-20 12:32:23 UTC Comment hidden (obsolete)

It's very slow but it opens in the end in 7.0+. Bug is valid for perf.

Comment 8 Timur 2022-02-28 15:40:30 UTC

Repro 7.4+. Very slow to open. 2:54 for me.

Comment 9 QA Administrators 2024-02-29 03:16:24 UTC Comment hidden (obsolete)

Dear Philip,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword

Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug

Comment 10 Dave Gilbert 2024-07-01 00:35:09 UTC

a little slow for me on a modern machine with 24.2.4.2-2; *as long as I have the navigator closed* - with it open it's much much slower.

The command line convert is now not awful;

dg@dalek:~/bugs/libreoffice-88914-pdfhang$ time libreoffice --headless --convert-to odg msa_bug.pdf 
convert /home/dg/bugs/libreoffice-88914-pdfhang/msa_bug.pdf as a Draw document -> /home/dg/bugs/libreoffice-88914-pdfhang/msa_bug.odg using filter : draw8

real	0m55.382s
user	0m53.156s
sys	0m2.213s

As well as page 14 mentioned in comment 4, page 22 also has a load.
(Although curiously command line image extract doesn't show them, so they must be getting created by a fill or something, but it doesn't look like a tiling fill)

Comment 11 V Stuart Foote 2024-07-01 12:11:19 UTC

Also, no issues filter opening to Draw with a 24.2.4.2 on Win10 build.

=> WFM


Version: 24.2.4.2 (X86_64) / LibreOffice Community
Build ID: 51a6219feb6075d9a4c46691dcfe0cd9c4fff3c2
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded

Comment 12 Dave Gilbert 2024-07-02 00:08:55 UTC

Whatever encoded this pdf is just depressing; while some of the problems we have are use of hyper-clever fill patterns in pdf, this one is just silly.
Page 14 has 3025 copies of each of 6 different stipple patterns, all indidivudally embedded in the PDF rather than referencing a single instance or using a tiled fill.
Page 22 has 1715 copies of each of the stipple patterns; although depressingly they seem to be used to make a stippled white on white background so are triply pointless.

Comment 13 V Stuart Foote 2024-07-02 10:51:50 UTC

Created attachment 195082 [details]
pg14, pg22 extracted from problem PDF and then opened AND inserted to Draw ODG

The PDF was generated 2012-11-06 with ghostscript based "PDF Creator 1.2.3"

Extracting page 14, and page 22 with PDFtk these individual PDF pages are both slow to "Open" into Draw canvas--but do load with poppler/cairo based filter.

Additionally, if the individual PDF pages are "Inserted" to document page, and so will use the pdfium based filter path, they open reasonably fast with good fidelity to original layout. 

As they are inserted as bitmaps, the resolution of the filter action can be adjusted by setting environment variable 'PDFIMPORT_RESOLUTION_DPI' but will otherwise get a default appropriate to the display device (so ~96-120 dpi) for non-HiDPI.

And of course, performing a "break" of an inserted image will have performance and fidelity issues.

Comment 14 V Stuart Foote 2024-07-02 10:59:35 UTC

@Miklos, Tomaž -- anything further to be said or done about the pdfio filter handling for this and similar PDF? The pdfium based filter does a good job with it. While poppler/cairo chokes just a bit.  

Any movement on convenience bug 114234 to not have to split out PDF pages?

Comment 15 Dave Gilbert 2024-07-18 17:29:36 UTC

I think it might be possible to combine the duplicated images during the import; on the poppler import path it looks fairly easy to me (everything goes through tree/imagecontainer.cxx which is a std:vector - I'm thinking of trying to turn it into a Hash of some type.
However, it does mean we have to figure out how to represent that shared image.
(I'm about to post a question to the list about that).

But there is a 2nd problem; if the Navigator is open the current code will still apparently hang - the Navigator really doesn't handle huge flat documents well.

Comment 16 Dave Gilbert 2024-07-18 23:04:50 UTC

(In reply to Dave Gilbert from comment #15)
> I think it might be possible to combine the duplicated images during the
> import; on the poppler import path it looks fairly easy to me (everything
> goes through tree/imagecontainer.cxx which is a std:vector - I'm thinking of
> trying to turn it into a Hash of some type.
> However, it does mean we have to figure out how to represent that shared
> image.
> (I'm about to post a question to the list about that).

Oh, Regina explained that's actually only in the flat format - there's already dedupe going on in the un-flat versions.

> But there is a 2nd problem; if the Navigator is open the current code will
> still apparently hang - the Navigator really doesn't handle huge flat
> documents well.

Actually, for this one, navigator is kind of surviving OK.
So yeh, this one seems OK now.