Bug 108411 - Very slow import of PDF files !!
Summary: Very slow import of PDF files !!
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
4.1 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks: PDF-Import-Draw CPU-AT-100% Memory
  Show dependency treegraph
 
Reported: 2017-06-08 10:24 UTC by yousifjkadom
Modified: 2023-04-27 11:37 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
BNF (8.95 MB, application/pdf)
2017-06-08 10:28 UTC, yousifjkadom
Details
freepdf print first 220 pages of about 1100 (20%) (2.44 MB, application/pdf)
2017-09-01 19:34 UTC, paulystefan
Details
odg of 220 pages in LOO Draw 5.3.5.2 win10 64bit (1.76 MB, application/vnd.oasis.opendocument.graphics)
2017-09-01 19:35 UTC, paulystefan
Details
save of import pdf with 532 pages in LOO Draw 5.3.5.2 (3.95 MB, application/vnd.oasis.opendocument.graphics)
2017-09-01 23:32 UTC, paulystefan
Details
part of first 532 pages printed with freepdf (5.28 MB, application/pdf)
2017-09-01 23:34 UTC, paulystefan
Details
Flamegraph (366.16 KB, application/x-bzip)
2023-04-26 13:39 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description yousifjkadom 2017-06-08 10:24:16 UTC
Hi. Currently LibreOffice Draw has very slow import of PDF files.

Take this example: try to open attach PDF. For me it will take infinite time to import on Linux Fedora 24 with L.O 5.1.6, so that it will never appear to me & I force L.O to quit ! I try it on L.O version 5.3.3 on Windows 7 which also show very slow import speed.

This attached file is just an example. It seem that, currently, import mechanism of L.O for PDF very slow which make it, practically, useless for PDF file manipulation ...
Comment 1 yousifjkadom 2017-06-08 10:28:20 UTC
Created attachment 133912 [details]
BNF
Comment 2 Xisco Faulí 2017-06-09 09:17:23 UTC
Confirmed in ( I killed it )

Version: 5.5.0.0.alpha0+
Build ID: 6ab249ea6aecef5d3f35d624622a368061cad9c3
CPU Threads: 4; OS Version: Linux 4.8; UI Render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

real	10m40.203s
user	10m30.988s
sys	0m3.448s
Comment 3 Xisco Faulí 2017-06-09 09:37:45 UTC
Also confirmed in ( I killed it as well ) 

Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)

real	18m43.877s
user	17m7.592s
sys	1m59.576s
Comment 4 paulystefan 2017-09-01 19:04:56 UTC
test in windows 10 with 5.3.5.2 64 bit

1069 pages in pdf 1.6 (acrobat 7)

more than 150.000 words and more than 1 Million signs and some fonts.

needs 3,5 GB of my 8 GB Ram after 2 green graph runs and then nothing done more.

import pdf function needs to much memory and runs against resource walls.



acrobat reader needs only 75 MB Ram in Task Manager.

import all from acrobat reader with copy paste in LOO Writer only 350 MB Ram needed.

Factor 10 or more RAM needed in LOO Draw PDF importer.
Comment 5 paulystefan 2017-09-01 19:34:22 UTC
Created attachment 135942 [details]
freepdf print first 220 pages of about 1100 (20%)

freepdf print first 220 pages of about 1100 (20%)
import about 5 minutes
Comment 6 paulystefan 2017-09-01 19:35:52 UTC
Created attachment 135943 [details]
odg of 220 pages in LOO Draw 5.3.5.2 win10 64bit

odg of 220 pages in LOO Draw 5.3.5.2 win10 64bit
Comment 7 paulystefan 2017-09-01 19:44:37 UTC
Too many pages in one run were imported.

Import functions must make some parts like 128 pages and then next 128 pages and so on.

At the end cat of all parts.
Comment 8 paulystefan 2017-09-01 20:08:07 UTC
import of 220 pages in about 5 Minutes by 700 MB Ram
Comment 9 paulystefan 2017-09-01 20:16:24 UTC
user needs a warning message pdf with many pages.
Comment 10 paulystefan 2017-09-01 23:32:34 UTC
Created attachment 135956 [details]
save of import pdf with 532 pages in LOO Draw 5.3.5.2

about 2 GB Ram needed and about 2 hours runtime for import and save.
Comment 11 paulystefan 2017-09-01 23:34:43 UTC
Created attachment 135957 [details]
part of first 532 pages printed with freepdf
Comment 12 yousifjkadom 2017-09-02 12:38:39 UTC
Hi.

Please can you test it on version 5.4.1 ? I'm unable for that because it is not available in my Fedora 26 Linux repositories.

I notice that since version 5.4, the import mechanism of pdf in LibreOffice, changed radically. The importer now changed from it's origin. It is now other tool that that used in version 5.3.x

For that it is better to test this issue on version 5.4.1 to see real situation of it.
Comment 13 Xavier Van Wijmeersch 2017-09-03 12:35:20 UTC
confirm with

Version: 6.0.0.0.alpha0+
Build ID: 9c165fe3084b7c054f9f04f3b065897abcbe2162
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-09-02_23:12:31
Locale: nl-BE (en_US.UTF-8); Calc: group

and with 5.4.2 update off this morning

very slow and eats a lot off ram more than 3.5gig
have to kill the process off loading the document
with okular a pdfreader it takes 2 seconds and 100mb ram
with the Gimp about 15 seconds and 1.2gig off ram

system os linux slackware64 current
Comment 14 paulystefan 2017-09-20 20:20:54 UTC
test with 5.4.2.1-64 in win 10-64. 

Intel Ivy bridge needs more then 1 Gig for 20%-file, 
AMD Phenom II in 5.3.5 only about 700 MB

only 6 to 7 per cent (25% is max for one process) in last phase of import in task manager.

So factor 3 to 4 of improving is possible only in acceleration for cpu.
Comment 15 paulystefan 2017-11-19 14:29:59 UTC
User needs a warning for pdf with many pages (more than 100).

PDFs with many pages need a split in some parts for actual LOO-Import with other tools like FreePDF.
Comment 16 paulystefan 2018-08-25 13:48:56 UTC
same behaviour in 6.1.0.3 x64  in win10-64
Comment 17 QA Administrators 2019-09-02 09:29:30 UTC Comment hidden (obsolete)
Comment 18 yousifjkadom 2019-09-02 19:21:42 UTC
Hi. This bug is still existing without any improvement, a fact which make LibreOffice useless for management of PDF files .....

It need radical change for PDF engine .... (&/or way via which deal with PDF files).
Comment 19 paulystefan 2020-04-07 20:23:49 UTC
A possible workaround is for programmers: 

eating the elephant in some steps is better than one step. 


A solution is global document ODM 

with 10 or 20  pages parts

with a preview window for the old way 

and a new way with options.


So the work is cut by some pages in many documents in the odm.


This is also a solution for different paper sizes and orientations.
Comment 20 yousifjkadom 2020-04-08 07:23:56 UTC
Hi. The best way to fix this bug - as I see - is to radically change the way by which LO handle PDF files by changing it to native PDF editor. Currently, LO make it's effects on PDF indirectly by changing it into intermediate format then change this intermediate format to PDF again. The best way is to make LO a real PDF editor by making it deal with PDF natively, even if that need to create new sub-program special for PDF files. In the later case - if you accept it - Libre Draw will deal with vector graphic images (but not with PDF) & new sub-program (let we call it "Libre PDF) will deal with PDF only. I recommend this because this bug last very long without fix & it is really make LO not suitable to edit PDF books because most of them composed from more than 100 pages.
Comment 21 yousifjkadom 2020-04-08 07:30:25 UTC
By the way, I forget to say that we have good program which is cross platforms called "Okular" can LO team fork it & develop it further to make from it the new proposed sub-program. 

If LO team add their already existing editing powers to Okular, then we will really will have good free & open source PDF editor .......
Comment 22 Roman Kuznetsov 2023-04-26 11:58:40 UTC
Created a separate bug 155030 because LO 7.6 can't open the PDF example now at all
Comment 23 Julien Nabet 2023-04-26 13:39:40 UTC
Created attachment 186938 [details]
Flamegraph

On pc Debian x86-64 with master sources updated today, I retrieved a Flamegraph. (I had to kill LO because import never ended).
Comment 24 Julien Nabet 2023-04-26 13:47:28 UTC
Miklos: Reading a bit https://conference.libreoffice.org/assets/Conference/Tirana/PDFium-updated.pdf (which is from 2018), I understand that pdfium is a faster lib to render pdf and has been used in LO to replace poppler lib in the long run.
However, it seems poppler still used today.

Any thoughts here? Does the Flamegraph trace retrieved may help to find hot spots in LO code?
Comment 25 Miklos Vajna 2023-04-26 14:47:44 UTC
pdfium is great to view a PDF. But it only gives us a bitmap. The (default) poppler-based PDF import rather focuses on giving you an editable Draw document.

I would love to see poppler go away (with its horrible external process), but I guess these will stay with us for a long time. Each is good for its own use-case. Does that help?
Comment 26 Julien Nabet 2023-04-26 14:51:47 UTC
(In reply to Miklos Vajna from comment #25)
> pdfium is great to view a PDF. But it only gives us a bitmap. The (default)
> poppler-based PDF import rather focuses on giving you an editable Draw
> document.
> 
I tried "insert image" with the pdf, I got the first page of the PDF quickly, perhaps it's a case where pdfium is used?

> I would love to see poppler go away (with its horrible external process),
> but I guess these will stay with us for a long time. Each is good for its
> own use-case. Does that help?
Do you think the slowliness is in Poppler or in LO? (I don't know if the Flamegraph trace allows to tell)
Comment 27 Miklos Vajna 2023-04-27 06:56:05 UTC
(In reply to Julien Nabet from comment #26)
> I tried "insert image" with the pdf, I got the first page of the PDF
> quickly, perhaps it's a case where pdfium is used?

Yes.

> Do you think the slowliness is in Poppler or in LO? (I don't know if the
> Flamegraph trace allows to tell)

No idea about this, sorry. :-)
Comment 28 Roman Kuznetsov 2023-04-27 11:25:20 UTC
the PDF file opening still takes so much time and takes over 4,5Gb of memory with 100% of CPU using in

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 5cd9de202765e243e41416802f3e4486b8a96f16
CPU threads: 16; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL threaded
Comment 29 Julien Nabet 2023-04-27 11:37:19 UTC
(In reply to Miklos Vajna from comment #27)
> (In reply to Julien Nabet from comment #26)
> > I tried "insert image" with the pdf, I got the first page of the PDF
> > quickly, perhaps it's a case where pdfium is used?
> 
> Yes.
pdfium seems very fast then! (even if it was just for first page).
> 
> > Do you think the slowliness is in Poppler or in LO? (I don't know if the
> > Flamegraph trace allows to tell)
> 
> No idea about this, sorry. :-)

It seems we need to use more pdfium (instead of Poppler) so it requires someone with some expertise and some time, I've got a bit of second one but none about first one=>uncc myself