Description: I'm trying to convert a bunch of files from ODT to PDF using the headless mode, e.g.: lowriter --headless --convert-to pdf --outdir PDF *.odt This works great as long as there are not too many files to convert. As soon as the folder contains a few hundred files, the conversion stops after reaching a certain number of files (247 in my case) and the libreoffice command returns no error. Steps to Reproduce: 1. Using command line, navigate to a folder containing several hundred ODT files $ ls -lc *.odt | wc -l 339 2. Convert the files to PDF $ libreoffice --headless --convert-to pdf --outdir PDF *.odt 3. Confirm that no error was raised $ echo $? 0 4. Fetch the number of generated PDF files $ ls -lc PDF/*.pdf | wc -l 247 Actual Results: 247 PDF files have been generated out of the 339 ODT files Expected Results: 339 PDF files should have been generated, one for each ODT file Reproducible: Always User Profile Reset: Yes Additional Info: There's nothing wrong with the ODT files, the 247 limit can be reached using different sets of ODT files, each of which can successfully be converted to PDF, as long as the number of files to convert remains below 247. I found a few instances of people experiencing the same issue with roughly similar limits: https://ask.libreoffice.org/t/how-to-avoid-convert-to-to-stop-after-249-files/25960 https://ask.libreoffice.org/t/error-converting-thousands-of-documents-with-libreoffice/1341 Some answers suggested this might be a shell limitation (e.g. when translating "*.odt" into the actual list of files), but I can successfully convert those files using unoconv, e.g.: $ unoconv -f pdf -o PDF *.odt $ ls -lc PDF/*.pdf | wc -l 339 A workaround is to pass the list of ODT files using xarg, but this creates one instance of libreoffice for each file, which defeats the purpose of using the headless mode.
I found another workaround here: https://stackoverflow.com/a/59918729/4657755 It simply consists of splitting the folder into sub-folders of 200 files. Not very convenient/suitable either.
Just in case this is relevant, this issue still occurs with the latest libreoffice shipped with Ubuntu 25.10 Version: 25.8.1.1 (X86_64) / LibreOffice Community Build ID: 580(Build:1) CPU threads: 8; OS: Linux 6.17; UI render: default; VCL: gtk3 Locale: fr-FR (en_US.UTF-8); UI: en-US Ubuntu package version: 4:25.8.1~rc1-0ubuntu1 Calc: threaded
Created attachment 203470 [details] 2 pages of lorem ipsum
repro with 300 copies of attached 2 pages ODT perl -e 'system "cp lorme.odt $_.odt" foreach ("001".."300")' stops after : .../248.odt as a Writer document -> .../PDF/248.pdf using filter : writer_pdf_Export Version: 25.2.6.2 (X86_64) / LibreOffice Community Build ID: 520(Build:2) CPU threads: 4; OS: Linux 6.14; UI render: default; VCL: gtk3 Locale: en-US (en_US.UTF-8); UI: fr-FR Ubuntu package version: 4:25.2.6-0ubuntu0.25.04.1 Calc: threaded
as a workaround, you can first start libreoffice --headless & then it goes through : libreoffice --headless --convert-to pdf --outdir PDF ???.odt ... convert .../300.odt as a Writer document -> .../PDF/300.pdf using filter : writer_pdf_Export
@fpy thanks for looking into it and for the tip! However, I tried starting libreoffice in the background as you suggested and unfortunately it doesn't really solve the problem. Indeed, the background process seem to stop at a random point in time while the conversion is still happening. To be noted that I run the conversion(s) inside a bash script. Here's a sample script, let's call it `odt2pdf.sh`: ``` generate_pdfs() { src_dir="$1" dest_dir="$2" # Ditch the output libreoffice --headless --convert-to pdf --outdir "$dest_dir" "$src_dir"/*.odt > /dev/null } generate_pdfs dir1 PDF generate_pdfs dir2 PDF generate_pdfs dir3 PDF generate_pdfs dir4 PDF generate_pdfs dir5 PDF ``` Each folder contains several hundred PDF files. Based on your suggestion, I tried starting libreoffice in the background in 3 different ways (see below). In each case, the background process stopped running before the end of the script. 1. Start it before running the script, e.g.: ``` $ libreoffice --headless > /dev/null & pid=$! [1] 2538114 $ bash odt2pdf.sh $ kill "$pid" bash: kill: (2538114) - No such process ``` 2. Start it at the beginning of the script, e.g.: ``` libreoffice --headless > /dev/null & pid=$! generate_pdfs dir1 PDF generate_pdfs dir2 PDF generate_pdfs dir3 PDF generate_pdfs dir4 PDF generate_pdfs dir5 PDF kill "$pid" ``` and then: ``` $ bash odt2pdf.sh odt2pdf.sh: line 16: kill: (2534784) - No such process ``` 3. Start it for each ceonversion, e.g.: ``` generate_pdfs() { src_dir="$1" dest_dir="$2" libreoffice --headless > /dev/null & pid=$! # No need to ditch the output since it's the background process that takes care of the output libreoffice --headless --convert-to pdf --outdir "$dest_dir" "$src_dir"/*.odt kill "$pid" } ``` and then: ``` $ bash odt2pdf.sh odt2pdf.sh: line 8: kill: (2534784) - No such process odt2pdf.sh: line 8: kill: (2534789) - No such process odt2pdf.sh: line 8: kill: (2535123) - No such process odt2pdf.sh: line 8: kill: (2535234) - No such process odt2pdf.sh: line 8: kill: (2535345) - No such process # Plus a bunch of lines like these since the output of the foreground process is not ditched convert dir1/file123.odt as a Writer document -> dir1/PDF/file123.pdf using filter : writer_pdf_Export ``` FYI, the workaround I've been using since I reported this bug is to split each folder in batches of 200 documents. The downside is that more instances of libreoffice than necessary are instantiated, but the overhead is minimal: ``` generate_pdfs() { src_dir="$1" dest_dir="$2" total=$(ls -lc "$src_dir"/*.odt | wc -l) i=0 while (( i < total )); do max=200 ((i = i + max)) if (( i > total )); then ((max = total + max - i)); fi ls "$src_dir"/*.odt | head -n "$i" | tail -n "$max" | \ bash -c "IFS=$'\n' read -d '' -ra x; lowriter --headless --convert-to pdf \ --outdir "$dest_dir" \"\${x[@]}\" > /dev/null" done } ```
I would suggest to go step by step. - can you confirm my WKA is ok for you in command line ? (i.e. you can convert more than 248 (simple) files in 1 call) - "libreoffice" in your PATH should be a script, calling at some pt the binary soffice.bin /usr/lib/libreoffice/program/soffice.bin: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7fe5541f2c75a4190186f4879a3cf9e2bba8a9f9, for GNU/Linux 3.2.0, stripped so be cautious with "$!" > ... the background process seem to stop at a random point notso. it stops at next call : $ libreoffice --headless & [3] 828096 $ libreoffice --headless --convert-to pdf --outdir PDF 001.odt convert .../001.odt as a Writer document -> .../001.pdf using filter : writer_pdf_Export Overwriting: /home/xpy/Downloads/PDF/001.pdf $ [3]+ Done libreoffice --headless
Good call about the PID! So indeed, I'm able to generate all the files when I execute the conversion command interactively after starting the process in the background. I think I have a clue of what goes wrong when applying this workaround to the script: actually, when running the conversion interactively, the background process doesn't necessarily stop right away after the conversion is over, e.g.: ``` $ soffice --headless > /dev/null & [1] 2611881 $ soffice --headless --convert-to pdf --outdir PDF *.odt $ [1]+ Done soffice --headless > /dev/null $ soffice --headless > /dev/null & [1] 2612077 $ soffice --headless --convert-to pdf --outdir PDF *.odt $ [1]+ Done soffice --headless > /dev/null ``` In the script, the conversions of some folders fail while some others are successful. Basically: ``` generate_pdfs() { src_dir="$1" dest_dir="$2" echo "Converting $1..." soffice --headless > /dev/null # No need to ditch the output since it's the background process that takes care of the output soffice --headless --convert-to pdf --outdir "$dest_dir" "$src_dir"/*.odt kill "$pid" echo "Done." } ``` and then: ``` $ bash odt2pdf.sh Converting dir1... Done. Converting dir2... convert dir2/file001.odt as a Writer document -> dir2/PDF/file001.pdf using filter : writer_pdf_Export convert dir2/file002.odt as a Writer document -> dir2/PDF/file002.pdf using filter : writer_pdf_Export ... # -> Bunch more lines until the conversion silently fails before reaching the end of the folder Done. Converting dir3... Done. Converting dir4... Done. Converting dir5... convert dir5/file001.odt as a Writer document -> dir5/PDF/file001.pdf using filter : writer_pdf_Export convert dir5/file002.odt as a Writer document -> dir5/PDF/file002.pdf using filter : writer_pdf_Export ... Done. ``` Conversion for folders dir2 and dir5 were unsuccessful. Note that which folders fail is quite random (it could be 2 and 4 or 2, 3 and 5 or 2, 4 and 5, etc.), but at least it seems that the first folder always goes through and that the second one always fails.
there are probably different bugs. - the 248 limitation (which could be nice to focus on in this very report) - then probably some files or files sequence causing an actual crash. if you can narrow down to reproduce and report separately ... > Basically: please make sure to report accurately, since details matter. > soffice --headless > /dev/null not in background ? > kill "$pid" still ? where is $pid defined/assigned ?
to clarify the limitation, it's actually in oosplash. when launching "libreoffice", it actually calls : 836291 271955 0 18:24 pts/1 00:00:00 /usr/lib/libreoffice/program/oosplash -- 836326 836291 17 18:24 pts/1 00:00:02 /usr/lib/libreoffice/program/soffice.bin oosplash actually receives the full list of args: $ cat /proc/836291/cmdline /usr/lib/libreoffice/program/oosplash--headless--convert-topdf--outdirPDF001.odt002.odt003.odt [...] 299.odt300.odt whereas soffice.bin just gets it shortened : usr/lib/libreoffice/program/soffice.bin--headless--convert-topdf--outdirPDF001.odt002.odt003.odt [...] 247.odt248.odt
>> soffice --headless > /dev/null > not in background ? >> kill "$pid" > still ? > where is $pid defined/assigned ? Sorry, these were typos when preparing my previous comment, since I simplified my script to its essential components for the purpose of readability. > please make sure to report accurately, since details matter. You're right. Therefore, I re-created a script from scratch. I managed to simplify it further. Please find it below with its actual output. Note that `dir` contains 300 files. odt2pdf.sh: ``` soffice --headless > /dev/null & soffice --headless --convert-to pdf --outdir PDF dir/*.odt ``` ``` $ bash odt2pdf.sh $ ls PDF/*.pdf | wc -l 248 $ ``` However, if I start the background soffice interactively, it works, e.g. odt2pdf.sh: ``` soffice --headless --convert-to pdf --outdir PDF dir/*.odt ``` ``` $ soffice --headless > /dev/null & [1] 45460 $ bash odt2pdf.sh Converting... Done. $ [1]+ Done soffice --headless > /dev/null $ ls PDF/*.pdf | wc -l 300 $ ``` Basically, it seems that when `soffice` is started in the background from a script, the workaround you mentioned doesn't work. > when launching "libreoffice", it actually calls : > 836291 271955 0 18:24 pts/1 00:00:00 /usr/lib/libreoffice/program/oosplash -- > 836326 836291 17 18:24 pts/1 00:00:02 /usr/lib/libreoffice/program/soffice.bin > ... > whereas soffice.bin just gets it shortened : > usr/lib/libreoffice/program/soffice.bin--headless--convert-topdf--outdirPDF001.odt002.odt003.odt [...] 247.odt248.odt Nice catch! I updated my script to invoke `/usr/lib/libreoffice/program/soffice.bin` directly instead of `soffice` and it worked! It can now generate the PDF files for all the ODT files and :))) Until the actual issue is fixed, that's definitely a super workaround. Thanks so much for your help!
> Converting... > Done. Please ignore this output in my previous comment, I removed the debug messages in the script and forgot to update the output accordingly.
(In reply to yarma22 from comment #11) > ... when `soffice` is started in the background from a > script, the workaround you mentioned doesn't work. add sleep 2 to let the first soffice gets already activated when the second soffice starts.
Indeed, that seems to be the reason why the workaround you suggested wasn't working inside a script: the foreground `soffice` instance would sometimes start before the background one gets activated.