Description: Hello, We have an application where we store microsoft office documents. As part of the release management we convert the office documents to pdf with watermarks using libreoffice. Scenario: Libreoffice takes time to convert docs to pdf which has more pages (Ex: around 600 - 1000 pages) I have tested both CLI and GUI. Result is same. Technical Stack: Redhat Enterprise 7.9 (3.10.0-1160.83.1.el7.x86_64) Tested multiple versions: Result is not ok. LibreOffice 7.1.5.2 LibreOffice 7.4.7 LibreOffice 7.5.2 CLI Command: /opt/libreoffice75/program/soffice --headless --convert-to “pdf:writer_pdf_Export” --outdir /tmp timeout.docx Size of docx : 5MB Appreciate some pointers to solve this problem. Steps to Reproduce: CLI Command: /opt/libreoffice75/program/soffice --headless --convert-to “pdf:writer_pdf_Export” --outdir /tmp timeout.docx Actual Results: Takes time to generate pdf - around 8 - 15mins Expected Results: pdf generates faster. Reproducible: Always User Profile Reset: Yes Additional Info: sample document attached
Created attachment 187146 [details] sample document with 662 pages
Tested, opening the file with word, and also take a lot of time to produce the pdf, I didn't wait to end. There are 663 tables, and a nine pages index. Takes a lot of time at opening up to have the file formatted, with the right number of pages. And seems there is a lot of direct format.
With LibreOffice 4.4.7.2 File Opening -> 120 seconds Saving PDF -> 90 seconds With Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: c4a58634753a84b09f20f7271d6525a6656522d3 CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win Locale: nl-NL (nl_NL); UI: en-US Calc: CL threaded File opening -> 300 seconds until something on screen, but still processing in the background.. 720 seconds and still not finished Save to PDF -> unable to measure, because background process keeps going Lots of time spend in SwFieldType::GetXObject (called by Python code) PyType_Ready PyEval_EvalFrameDefault PyObject_Call PyFunction_Vectorcall PyCell_Set PyMethod_Self PyObject_CallMethodId_SizeT PyObject_CallFunctionObjArgs PyType_Ready PyType_Ready PyEval_EvalFrameDefault PyObject_Call PyFunction_Vectorcall PyCell_Set PyMethod_Self PyVectorcall_Call PyInit_pyuno [00007FFF8F4D281C] [00007FFF8F4D2CB7] [00007FFF4C8B3E9A] [00007FFF4C8B4311] uno_ext_getMapping uno_ext_getMapping uno_ext_getMapping linguistic_DicList_get_implementation osl_getTempDirURL
disk I/ show decent loading (in my case 1 MB/s) on screen after 120 seconds with Version: 6.1.6.3 Build ID: 5896ab1714085361c45cf540f76f60673dd96a72 CPU threads: 4; OS: Windows 6.3; UI render: default; Locale: nl-NL (nl_NL); Calc: CL it's 100 kb/s for 7.6.0.0. ---- Lots of (or endless) background processing though after open by grammar checking on regular file open (however probably no relevant for commandline export) Everything is fine loading speed until on screen based on disk i/o (120 seconds) and grammar checking (120 seconds) with Version: 5.2.5.0.0+ Build ID: a4d4fbeb623013f6377b30711ceedb38ea4b49f8 CPU Threads: 4; OS Version: Windows 6.2; UI Render: GL; TinderBox: Win-x86@62-merge-TDF, Branch:libreoffice-5-2, Time: 2016-12-24_14:43:55 Locale: nl-NL (nl_NL); Calc: CL So there are actually even two perf issues, if you ask me..
Created attachment 187150 [details] Bibisect log Bibisected based on loading I/O speed to: author Michael Stahl <Michael.Stahl@cib.de> 2019-09-06 19:36:48 +0200 committer Michael Stahl <Michael.Stahl@cib.de> 2019-09-17 10:45:40 +0200 commit 5ba30f588d6e41a13d68b1461345fca7a7ca61ac (patch) tree 6f098ffd0fb2c75a2c1cbda4e7b82bd65fb8e7dd parent 6e1cb2e9dd406fb2883460cefaa4660622996005 (diff) tdf#64222 sw: better DOCX import/export of paragraph marker formatting The problem here is that Word allows formatting the paragraph end marker, and applies the same formatting to the generated numbering string; Writer has no such marker thing. This is currently represented by an empty AUTOFMT hint at the end of the paragraph, which is created almost by accident in SwXText::finishParagraph(), because the paragraph properties are set on a SwPaM that doesn't select the whole paragraph but sits at the end. This is a bit fragile and the hint may have unfortunate accidents such as being merged into a preceding AUTOFMT hint if it happens to have the same items in it. It ought to work better to have an item in SwTextNode's SwAttrSet to store these special items; has the advantage that the items will also be copied when you split the paragraph, like in Word. Add a RES_PARATR_LIST_AUTOFMT and UNO property "ListAutoFormat" (which should be considered a first draft...) and use it in preference (where possible) or in addition to (where necessary due to other missing pieces) the empty hint. Also revert the change in checkApplyParagraphMarkFormatToNumbering() to consider hints that start before the end of the paragraph, as it has unintended side effects as pointed out by Mike Kaganski.
@Noel, You might be interested in this one.. assuming the design of commit "better DOCX import/export of paragraph marker formatting" being fine by itself, but simply requiring some optimizations to perform better.
On my machine, using current master, this is already below 45seconds, so I think we can consider this fixed (probably by various other patches I have done to writer)
About 50" for me with Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: 99a88c9e55872214ce01d89447d18708e47e956b CPU threads: 16; OS: Windows 10.0 Build 22621; UI render: default; VCL: win Locale: es-ES (es_ES); UI: en-US Calc: CL threaded with accessibility option disable, with it enable I was having issues, I'll retest it and in case report in a new bug.
Hello, Is it possible to push the fix to 7.4 or 7.5 versions ? Thanks, Naresh
(In reply to Naresh from comment #9) > > Is it possible to push the fix to 7.4 or 7.5 versions ? If you want to kind of service, I suggest you contract with a company like Collabora Productivity to do it for you.