Bug 165347 - conversion from pdf to docx takes too much and there's crashing
Summary: conversion from pdf to docx takes too much and there's crashing
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
25.2.0.3 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx, filter:pdf, haveBacktrace, perf
Depends on:
Blocks:
 
Reported: 2025-02-20 08:01 UTC by moemensaadeh936
Modified: 2025-11-05 12:09 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
this is the file that i was trying to convert it to docx (20.03 MB, application/pdf)
2025-02-20 08:10 UTC, moemensaadeh936
Details
Perf flamegraph (1.11 MB, image/svg+xml)
2025-11-05 09:45 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description moemensaadeh936 2025-02-20 08:01:46 UTC
Description:
i tried to convert a pdf file size(1.5MB) to docx through command line and it works without any problem it takes around 3s and i got satisfied with the result, then i wanted to test the application with a large file like 20MB ,i waited for maybe 10 min and it hasn't finished + the terminal has crashed when i tried to terminate the process.

Actual Results:
actually i waited for like 10 min then i exit the terminal cause nothing happened and the app has crashed.

Expected Results:
converting large pdf to docx within a satisfied time


Reproducible: Always


User Profile Reset: No

Additional Info:
i have windows 11 pro , 16GB ram , 13th Gen Intel(R) Core(TM) i7-13620H   2.40 GHz
idk if it's related to memory or why it hasn't converted within a reasonable time , i tried to convert this dummy file for testing 

https://examplefile.com/document/pdf/20-mb-pdf#google_vignette

i dont know if can update anything in options->advanced->...
or what the reasons of crashing
Comment 1 moemensaadeh936 2025-02-20 08:10:31 UTC
Created attachment 199335 [details]
this is the file that i was trying to convert it to docx

 this is the command that i run 

"C:\\Program Files\\LibreOffice\\program\\soffice.exe" --headless --convert-to docx --infilter="writer_pdf_import" "C:\\Users\\user\\Downloads\\20mb.pdf" --outdir "C:\\Users\\user\\Desktop\\file-convertor\\uploads"
Comment 2 m_a_riosv 2025-02-20 22:46:45 UTC
Tested with other apps also takes a lot of time,
E.g. with Gimp or Word.

It is a long PDF with a lot of tables.
Comment 3 moemensaadeh936 2025-02-23 17:07:10 UTC
here is another info inside C:\Users\user\AppData\Roaming\LibreOffice\4\crash

ProductName=LibreOffice
Version=25.2.0.3
BuildID=e1cf4a87eb02d755bce1a01209907ea5ddc8f069
URL=https://crashreport.libreoffice.org/submit/
UseSkia=true
Language=en-US
CPUModelName=13th Gen Intel(R) Core(TM) i7-13620H
CPUFlags=sse3 pclmulqdq monitor ssse3 fma cpmxch16b sse41 sse42 movbe popcnt aes xsave osxsave avx f16c rdrand msr cx8 sep cmov clfsh mmx fxsr sse sse2 ht fsgsbase bmi1 avx2 bmi2 erms invpcid rdseed adx sha lahf abm syscall rdtscp
MemoryTotal=16462712 kB
ShutDown=true
Comment 4 Buovjaga 2025-10-30 13:18:17 UTC
I shrunk it down to 1000 pages with:

qpdf --empty --pages ./20mb.pdf 1-1000 -- ./20mb_1000p.pdf

Conversion time on my machine:

real    4m59,879s
user    5m2,320s
sys     0m0,538s

Arch Linux 64-bit
Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 38121c6f0208f9db0a6d69e33efc7d1eec0aae31
CPU threads: 8; OS: Linux 6.17; UI render: default; VCL: gtk3
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: CL threaded
Built on 30 October 2025
Comment 5 Buovjaga 2025-11-05 09:45:10 UTC
Created attachment 203740 [details]
Perf flamegraph

Recorded a perf trace for a version cut down to 500 of the first pages.

Lots of work being done in lcl_GetUniqueFlyName()

Arch Linux 64-bit
Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: d0d81540a7f1e4b2c0b7a305f9f64c518edec8c3
CPU threads: 8; OS: Linux 6.17; UI render: default; VCL: gtk3
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: CL threaded
Built on 5 November 2025
Comment 6 Buovjaga 2025-11-05 12:09:39 UTC
This has actually gotten much better recently. In the oldest of 7.6 Linux bibisect repo, with the 500 page version I get:
real    3m38,753s
user    3m35,717s
sys     0m0,745s

In the master commit of 25.2 Linux bibisect repo, I get:
real    1m37,381s
user    1m38,233s
sys     0m0,343s