Bug 163863 - Headless conversion silently stops after reaching ~250 files [oosplash argv; see comment 10]
Summary: Headless conversion silently stops after reaching ~250 files [oosplash argv; ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: framework (show other bugs)
Version:
(earliest affected)
24.2.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: QA:needsComment
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-12 15:35 UTC by yarma22
Modified: 2025-10-22 07:29 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
2 pages of lorem ipsum (69.29 KB, application/vnd.oasis.opendocument.text)
2025-10-21 13:05 UTC, fpy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description yarma22 2024-11-12 15:35:32 UTC
Description:
I'm trying to convert a bunch of files from ODT to PDF using the headless mode, e.g.:
lowriter --headless --convert-to pdf --outdir PDF *.odt

This works great as long as there are not too many files to convert. As soon as the folder contains a few hundred files, the conversion stops after reaching a certain number of files (247 in my case) and the libreoffice command returns no error.

Steps to Reproduce:
1. Using command line, navigate to a folder containing several hundred ODT files
$ ls -lc *.odt | wc -l
339

2. Convert the files to PDF
$ libreoffice --headless --convert-to pdf --outdir PDF *.odt

3. Confirm that no error was raised
$ echo $?
0

4. Fetch the number of generated PDF files
$ ls -lc PDF/*.pdf | wc -l
247

Actual Results:
247 PDF files have been generated out of the 339 ODT files

Expected Results:
339 PDF files should have been generated, one for each ODT file


Reproducible: Always


User Profile Reset: Yes

Additional Info:
There's nothing wrong with the ODT files, the 247 limit can be reached using different sets of ODT files, each of which can successfully be converted to PDF, as long as the number of files to convert remains below 247.

I found a few instances of people experiencing the same issue with roughly similar limits:
https://ask.libreoffice.org/t/how-to-avoid-convert-to-to-stop-after-249-files/25960
https://ask.libreoffice.org/t/error-converting-thousands-of-documents-with-libreoffice/1341

Some answers suggested this might be a shell limitation (e.g. when translating "*.odt" into the actual list of files), but I can successfully convert those files using unoconv, e.g.:
$ unoconv -f pdf -o PDF *.odt
$ ls -lc PDF/*.pdf | wc -l
339

A workaround is to pass the list of ODT files using xarg, but this creates one instance of libreoffice for each file, which defeats the purpose of using the headless mode.
Comment 1 yarma22 2024-11-12 15:42:59 UTC
I found another workaround here:
https://stackoverflow.com/a/59918729/4657755

It simply consists of splitting the folder into sub-folders of 200 files. Not very convenient/suitable either.
Comment 2 yarma22 2025-10-20 09:38:49 UTC
Just in case this is relevant, this issue still occurs with the latest libreoffice shipped with Ubuntu 25.10

Version: 25.8.1.1 (X86_64) / LibreOffice Community
Build ID: 580(Build:1)
CPU threads: 8; OS: Linux 6.17; UI render: default; VCL: gtk3
Locale: fr-FR (en_US.UTF-8); UI: en-US
Ubuntu package version: 4:25.8.1~rc1-0ubuntu1
Calc: threaded
Comment 3 fpy 2025-10-21 13:05:41 UTC
Created attachment 203470 [details]
2 pages of lorem ipsum
Comment 4 fpy 2025-10-21 13:09:30 UTC
repro with 300 copies of attached 2 pages ODT
  perl -e 'system "cp lorme.odt $_.odt" foreach ("001".."300")'

stops after :
  .../248.odt as a Writer document -> .../PDF/248.pdf using filter : writer_pdf_Export


Version: 25.2.6.2 (X86_64) / LibreOffice Community
Build ID: 520(Build:2)
CPU threads: 4; OS: Linux 6.14; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: fr-FR
Ubuntu package version: 4:25.2.6-0ubuntu0.25.04.1
Calc: threaded
Comment 5 fpy 2025-10-21 13:13:53 UTC
as a workaround, you can first start 
 libreoffice --headless &

then it goes through :
libreoffice --headless --convert-to pdf --outdir PDF ???.odt

...
convert .../300.odt as a Writer document -> .../PDF/300.pdf using filter : writer_pdf_Export
Comment 6 yarma22 2025-10-21 14:15:26 UTC
@fpy thanks for looking into it and for the tip!

However, I tried starting libreoffice in the background as you suggested and unfortunately it doesn't really solve the problem. Indeed, the background process seem to stop at a random point in time while the conversion is still happening.

To be noted that I run the conversion(s) inside a bash script. Here's a sample script, let's call it `odt2pdf.sh`:

```
generate_pdfs() {
  src_dir="$1"
  dest_dir="$2"

  # Ditch the output
  libreoffice --headless --convert-to pdf --outdir "$dest_dir" "$src_dir"/*.odt > /dev/null
}

generate_pdfs dir1 PDF
generate_pdfs dir2 PDF
generate_pdfs dir3 PDF
generate_pdfs dir4 PDF
generate_pdfs dir5 PDF
```

Each folder contains several hundred PDF files. Based on your suggestion, I tried starting libreoffice in the background in 3 different ways (see below). In each case, the background process stopped running before the end of the script.

1. Start it before running the script, e.g.:

```
$ libreoffice --headless > /dev/null & pid=$!
[1] 2538114
$ bash odt2pdf.sh
$ kill "$pid"
bash: kill: (2538114) - No such process
```

2. Start it at the beginning of the script, e.g.:
```
libreoffice --headless > /dev/null & pid=$!
generate_pdfs dir1 PDF
generate_pdfs dir2 PDF
generate_pdfs dir3 PDF
generate_pdfs dir4 PDF
generate_pdfs dir5 PDF
kill "$pid"
```
and then:
```
$ bash odt2pdf.sh
odt2pdf.sh: line 16: kill: (2534784) - No such process
```

3. Start it for each ceonversion, e.g.:
```
generate_pdfs() {
  src_dir="$1"
  dest_dir="$2"

  libreoffice --headless > /dev/null & pid=$!
  # No need to ditch the output since it's the background process that takes care of the output
  libreoffice --headless --convert-to pdf --outdir "$dest_dir" "$src_dir"/*.odt
  kill "$pid"
}
```
and then:
```
$ bash odt2pdf.sh
odt2pdf.sh: line 8: kill: (2534784) - No such process
odt2pdf.sh: line 8: kill: (2534789) - No such process
odt2pdf.sh: line 8: kill: (2535123) - No such process
odt2pdf.sh: line 8: kill: (2535234) - No such process
odt2pdf.sh: line 8: kill: (2535345) - No such process
# Plus a bunch of lines like these since the output of the foreground process is not ditched
convert dir1/file123.odt as a Writer document -> dir1/PDF/file123.pdf using filter : writer_pdf_Export
```

FYI, the workaround I've been using since I reported this bug is to split each folder in batches of 200 documents. The downside is that more instances of libreoffice than necessary are instantiated, but the overhead is minimal:

```
generate_pdfs() {
  src_dir="$1"
  dest_dir="$2"

  total=$(ls -lc "$src_dir"/*.odt | wc -l)
  i=0
  while (( i < total )); do
    max=200
    ((i = i + max))
    if (( i > total )); then ((max = total + max - i)); fi
    ls "$src_dir"/*.odt | head -n "$i" | tail -n "$max" | \
      bash -c "IFS=$'\n' read -d '' -ra x; lowriter --headless --convert-to pdf \
        --outdir "$dest_dir" \"\${x[@]}\" > /dev/null"
  done
}
```
Comment 7 fpy 2025-10-21 14:35:39 UTC
I would suggest to go step by step.
- can you confirm my WKA is ok for you in command line ? (i.e. you can convert more than 248 (simple) files in 1 call)

- "libreoffice" in your PATH should be a script, calling at some pt the binary  soffice.bin
/usr/lib/libreoffice/program/soffice.bin: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7fe5541f2c75a4190186f4879a3cf9e2bba8a9f9, for GNU/Linux 3.2.0, stripped
so be cautious with "$!"


> ... the background process seem to stop at a random point

notso. 
it stops at next call :

$ libreoffice --headless &
[3] 828096

$ libreoffice --headless --convert-to pdf --outdir PDF 001.odt
convert .../001.odt as a Writer document -> .../001.pdf using filter : writer_pdf_Export
Overwriting: /home/xpy/Downloads/PDF/001.pdf

$ 
[3]+  Done                    libreoffice --headless
Comment 8 yarma22 2025-10-21 15:20:47 UTC
Good call about the PID!

So indeed, I'm able to generate all the files when I execute the conversion command interactively after starting the process in the background.

I think I have a clue of what goes wrong when applying this workaround to the script: actually, when running the conversion interactively, the background process doesn't necessarily stop right away after the conversion is over, e.g.:
```
$ soffice --headless > /dev/null &
[1] 2611881
$ soffice --headless --convert-to pdf --outdir PDF *.odt
$ 
[1]+  Done                    soffice --headless > /dev/null
$ soffice --headless > /dev/null &
[1] 2612077
$ soffice --headless --convert-to pdf --outdir PDF *.odt
$ 
[1]+  Done                    soffice --headless > /dev/null
```

In the script, the conversions of some folders fail while some others are successful. Basically:
```
generate_pdfs() {
  src_dir="$1"
  dest_dir="$2"

  echo "Converting $1..."
  soffice --headless > /dev/null
  # No need to ditch the output since it's the background process that takes care of the output
  soffice --headless --convert-to pdf --outdir "$dest_dir" "$src_dir"/*.odt
  kill "$pid"
  echo "Done."
}
```
and then:
```
$ bash odt2pdf.sh
Converting dir1...
Done.
Converting dir2...
convert dir2/file001.odt as a Writer document -> dir2/PDF/file001.pdf using filter : writer_pdf_Export
convert dir2/file002.odt as a Writer document -> dir2/PDF/file002.pdf using filter : writer_pdf_Export
... # -> Bunch more lines until the conversion silently fails before reaching the end of the folder
Done.
Converting dir3...
Done.
Converting dir4...
Done.
Converting dir5...
convert dir5/file001.odt as a Writer document -> dir5/PDF/file001.pdf using filter : writer_pdf_Export
convert dir5/file002.odt as a Writer document -> dir5/PDF/file002.pdf using filter : writer_pdf_Export
...
Done.
```

Conversion for folders dir2 and dir5 were unsuccessful. Note that which folders fail is quite random (it could be 2 and 4 or 2, 3 and 5 or 2, 4 and 5, etc.), but at least it seems that the first folder always goes through and that the second one always fails.
Comment 9 fpy 2025-10-21 16:04:53 UTC
there are probably different bugs.

- the 248 limitation (which could be nice to focus on in this very report)

- then probably some files or files sequence causing an actual crash.
if you can narrow down to reproduce and report separately ... 


> Basically:

please make sure to report accurately, since details matter.

> soffice --headless > /dev/null

not in background ?

> kill "$pid"

still ? 
where is $pid defined/assigned ?
Comment 10 fpy 2025-10-21 16:30:06 UTC
to clarify the limitation, it's actually in  oosplash.

when launching "libreoffice", it actually calls :

836291  271955  0 18:24 pts/1    00:00:00 /usr/lib/libreoffice/program/oosplash --
836326  836291 17 18:24 pts/1    00:00:02 /usr/lib/libreoffice/program/soffice.bin


oosplash actually receives the full list of args:
$ cat /proc/836291/cmdline 
/usr/lib/libreoffice/program/oosplash--headless--convert-topdf--outdirPDF001.odt002.odt003.odt [...] 299.odt300.odt

whereas soffice.bin just gets it shortened : 
usr/lib/libreoffice/program/soffice.bin--headless--convert-topdf--outdirPDF001.odt002.odt003.odt  [...] 247.odt248.odt
Comment 11 yarma22 2025-10-21 17:27:23 UTC
>> soffice --headless > /dev/null

> not in background ?

>> kill "$pid"

> still ? 
> where is $pid defined/assigned ?

Sorry, these were typos when preparing my previous comment, since I simplified my script to its essential components for the purpose of readability.

> please make sure to report accurately, since details matter.

You're right. Therefore, I re-created a script from scratch. I managed to simplify it further. Please find it below with its actual output. Note that `dir` contains 300 files.

odt2pdf.sh:
```
soffice --headless > /dev/null &
soffice --headless --convert-to pdf --outdir PDF dir/*.odt
```

```
$ bash odt2pdf.sh 
$ ls PDF/*.pdf | wc -l
248
$ 
```

However, if I start the background soffice interactively, it works, e.g.

odt2pdf.sh:
```
soffice --headless --convert-to pdf --outdir PDF dir/*.odt
```

```
$ soffice --headless > /dev/null &
[1] 45460
$ bash odt2pdf.sh 
Converting...
Done.
$ 
[1]+  Done                    soffice --headless > /dev/null
$ ls PDF/*.pdf | wc -l
300
$
```

Basically, it seems that when `soffice` is started in the background from a script, the workaround you mentioned doesn't work.


> when launching "libreoffice", it actually calls :

> 836291  271955  0 18:24 pts/1    00:00:00 /usr/lib/libreoffice/program/oosplash --
> 836326  836291 17 18:24 pts/1    00:00:02 /usr/lib/libreoffice/program/soffice.bin
> ...
> whereas soffice.bin just gets it shortened : 
> usr/lib/libreoffice/program/soffice.bin--headless--convert-topdf--outdirPDF001.odt002.odt003.odt  [...] 247.odt248.odt

Nice catch! I updated my script to invoke `/usr/lib/libreoffice/program/soffice.bin` directly instead of `soffice` and it worked! It can now generate the PDF files for all the ODT files and :)))

Until the actual issue is fixed, that's definitely a super workaround. Thanks so much for your help!
Comment 12 yarma22 2025-10-21 17:30:23 UTC
> Converting...
> Done.

Please ignore this output in my previous comment, I removed the debug messages in the script and forgot to update the output accordingly.
Comment 13 fpy 2025-10-21 17:46:18 UTC
(In reply to yarma22 from comment #11)

>  ... when `soffice` is started in the background from a
> script, the workaround you mentioned doesn't work.

add
 sleep 2
to let the first  soffice gets already activated when the second  soffice starts.
Comment 14 yarma22 2025-10-22 07:29:39 UTC
Indeed, that seems to be the reason why the workaround you suggested wasn't working inside a script: the foreground `soffice` instance would sometimes start before the background one gets activated.