Created attachment 189995 [details] pdf for testing purposes --convert-to “docx:MS Word 2007 XML” test.pdf --infilter=“writer_pdf_import” --headless this command works not bad in 6.4 and doesn't in 7.6 version, docx file is collapsing to one page but in 6.4 version it seems like ignoring tables
libreoffice --convert-to 'docx:MS Word 2007 XML' test.pdf --infilter='writer_pdf_import' --headless changed quoutes
Repro using Version: 7.6.2.1 (X86_64) / LibreOffice Community Build ID: 56f7684011345957bbf33a7ee678afaf4d2ba333 CPU threads: 12; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win Locale: ru-RU (ru_RU); UI: en-US Calc: CL threaded and also using current master, using e.g. command: > soffice --convert-to docx test.pdf --infilter=writer_pdf_import It is not specific to the DOCX; the same result when the "docx" is replaced with e.g. "doc". The interesting thing is, that the single-page result is visible *in MS Word*, but not in Writer. Version 7.4.0.3 generated a file that opened normally in Word. The problem started already in version 7.5.0.3.
And also it seems a bug with xlsx format libreoffice7.6 --headless --convert-to xlsx:"Calc MS Excel 2007 XML" test.pdf convert /home/tmp/test.pdf as a Draw document -> /home/tmp/test.xlsx using filter : Calc MS Excel 2007 XML Unspecified Application Error If I will use specified filter in documentation it will be libreoffice7.6 --headless --infilter="calc_pdf_addstream_import" --convert-to xlsx:"Calc MS Excel 2007 XML" test.pdf Error: source file could not be loaded
(In reply to ruslanik55 from comment #3) Please file another bug for that - it is completely unrelated to the issue that you filed here. The command line without infilter fails correctly - because without it, the PDF is loaded into Draw, and the latter can't save to a spreadsheet format. The same as if you opened it interactively. The second might mean that the PDF has nothing that could be interpret as a spreadsheet content. Which might be OK, depending on a specific PDF.
The resulting DOCX is impossibly slow to open in LO and hangs for me. But even opening the original PDF with Writer's PDF filter results in LO hanging (document displayed but impossible to work on it). Tested recent trunk build and 6.0.0.3. In any case, even with the long loading times in both LO and MSO, I can see the collapsed contents in MSO, which I bibisected with linux-64-7.4 to first bad build commit [b77a5408177cf0db37ca5aa3d9cf106c0157ab9b] which points to core commit 588e59cc36475ded243ce4fd9062473cddd2c016 which is a cherrypick of: commit fc2fb95fdb4262792e94afe61b784c8ae71d171e author Kevin Suo Sun Oct 23 19:10:29 2022 +0800 committer Kevin Suo Sun Oct 23 20:10:18 2022 +0800 sdext.pdfimport Writer: Do not visit DrawElement twice in WriterXmlEmitter https://gerrit.libreoffice.org/c/core/+/142313 Kevin, can you please have a look?
(In reply to Stéphane Guillou (stragu) from comment #5) Stéphane: There may be more than one issue here. Would you please clarify: Is the source commit fc2fb95fdb4262792e94afe61b784c8ae71d171e you identified causes the .docx file content to be on one page (when you open it in MSO or some other office software), or does it cause the slow loading time when you open the docx in LibreOffice Writer? Also, would you please attach your bibisect log? I tried with a bibisect version of: 2021-11-25 20:07:24 source-hash-bd0fb2d95 bump product version to 7.4.0.0.alpha0+ It does not cause the content to be on one page (when open with MSO), but when open the generated docx in LibreOffice Writer the loading is already slow and not able to work in that docx in Writer.
I am reversing that commit: https://gerrit.libreoffice.org/c/core/+/159811
(In reply to Kevin Suo from comment #6) > (In reply to Stéphane Guillou (stragu) from comment #5) > Would you please clarify: > Is the source commit fc2fb95fdb4262792e94afe61b784c8ae71d171e you identified > causes the .docx file content to be on one page (when you open it in MSO or > some other office software), or does it cause the slow loading time when you > open the docx in LibreOffice Writer? I tested opening the resulting DOCX with online MS Office 365. While neither ever finishes loading, these are the differences: * Before commit: a few seconds to show canvas, objects like images actually rendered, more than one page (at least 5). 794 kb. * Since commit: more than 30 seconds to show canvas; all elements overlapped on one single page, loads forever. Slightly bigger file. (801 kb) Definitely already problematic before your commit, but I understood we were focusing on the 1-page issue here. On the other hand, converting to DOC makes the regression more obvious as there is no loading problem: * Before commit: correct number of pages (20) * After commit: all content on single page
Kevin Suo committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/5589659829f8a1cef8ca1c8a468732105bbe231b tdf#157589 tdf#153969: Revert "sdext.pdfimport Writer: Do not visit... It will be available in 24.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Kevin Suo committed a patch related to this issue. It has been pushed to "libreoffice-7-6": https://git.libreoffice.org/core/commit/f52d8f004f7d70f89ee805c6f71f1791cac70c0f tdf#157589 tdf#153969: Revert "sdext.pdfimport Writer: Do not visit... It will be available in 7.6.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Verified opening on MS office.com a doc exported with: Version: 24.2.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: 619500d6919c227e734b119481a4b334972e0b7b CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3 Locale: en-AU (en_AU.UTF-8); UI: en-US Calc: threaded Thank you!