Bug 65454 - HTML > PDF Conversion Hangs When HTML Table Contains Large Number of Rows with Formatting Info
Summary: HTML > PDF Conversion Hangs When HTML Table Contains Large Number of Rows wit...
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
(earliest affected)
3.4.0 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
Whiteboard: BSA
Keywords: filter:pdf
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
Reported: 2013-06-06 11:49 UTC by James
Modified: 2018-12-09 20:31 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:

Test file to reproduce (592.86 KB, text/html)
2013-06-06 11:49 UTC, James
Smaller version of repro file that doesn't cause the issue. (277.20 KB, text/html)
2013-06-06 11:50 UTC, James
Strace.log from a reproduction of the issue (724.97 KB, text/x-c)
2013-06-06 11:50 UTC, James
Strace.log from a run of the file that works (792.77 KB, application/octet-stream)
2013-06-06 11:51 UTC, James
Logs (2.19 MB, application/x-zip-compressed)
2013-09-17 13:20 UTC, James
strace.log of tableTestBroken.html with LO3.6 (1.79 MB, text/plain)
2014-06-15 15:02 UTC, Luuk
PDF from LO 6.2 beta 1 (115.99 KB, application/pdf)
2018-12-09 20:30 UTC, Roman Kuznetsov

Note You need to log in before you can comment on or make changes to this bug.
Description James 2013-06-06 11:49:21 UTC
Created attachment 80391 [details]
Test file to reproduce

Problem description: 

We have had an issue on an applications server running libreoffice to convert documents to PDF where certain documents would cause the soffice process to run at near 100% CPU without ever completing, hanging the associated system. The files being converted were relatively small html files, and similar files from the same source completed without issue.

I carried out some experimentation, and was able to determine that the problem appeared to be that the failing html documents contained table elements with a large number (~300) of rows taht included style information. I was working on the assumption that the problem might be due to an odd character, so was trying to search through the file by deleting half of the rows and retrying to zoom in on the failure in, but I found that it didn't matter which half of the rows I deleted, deleting half the rows fixed the issue!

I've managed to create a pair of test files that shows this behavior independent of the original documents, which I've attached to this issue - on the test system I'm using, "tableTestWorks.html" completes conversion without issue, while "tableTestBroken.html" hangs as described. I've also attached strace outputs for the successful run of tableTestWorks and the failed run of tableTestBroken (CTRL-C used to break after some time).

I've downloaded 4.0.3 windows and cannot reproduce the issue in that environment. However, on Ubuntu I manually installed 4.0.3 and the issue appeared to remain.

Steps to reproduce:
1. Download the attached file tableTestBroken.html
2. Create folder Test in ubuntu's home on a server and place file here.
3. Run /usr/bin/soffice --headless --convert-to pdf --strace --outdir /home/ubuntu/Test /home/ubuntu/Test/tableTestBroken.html

Current behavior:
The soffice process hangs forever at high CPU usage

Expected behavior:
The conversion completes

Operating System: Ubuntu
Version: 3.4.0 release
Comment 1 James 2013-06-06 11:50:07 UTC
Created attachment 80392 [details]
Smaller version of repro file that doesn't cause the issue.
Comment 2 James 2013-06-06 11:50:34 UTC
Created attachment 80393 [details]
Strace.log from a reproduction of the issue
Comment 3 James 2013-06-06 11:51:02 UTC
Created attachment 80394 [details]
Strace.log from a run of the file that works
Comment 4 tommy27 2013-07-16 19:42:37 UTC
no hanging during HTML to PDF conversion using LibO 4.0.4 on Win7 64bit.
bug must be Linux specific as reported by user
Comment 5 tommy27 2013-09-14 06:18:38 UTC
do you still see this bug in 4.1.1?
Comment 6 James 2013-09-16 08:47:14 UTC
I will re-test it when I get a chance, but it might be a few days before I have the time.
Comment 7 James 2013-09-17 13:20:16 UTC
Created attachment 85973 [details]

I've installed the latest 4.1 build from https://launchpad.net/~libreoffice/+archive/libreoffice-4-1.

The behaviour remains the same, and I've attached a zip file containing logs of failure and success (strace.log for failure, stracegood.log for success). I tried running the failure case for significantly longer and the behaviour did not change - and I couldn't use the log file as log generation appears to be massively more space intensive in this version (failure case was generating >1MB of logs per second).
Comment 8 James 2013-09-17 13:41:16 UTC
FWIW, my Linux install is Ubuntu 12.04.3.
Comment 9 tommy27 2014-06-15 13:56:58 UTC
what about LibO 4.2.x 
is issue still there?
Comment 10 Luuk 2014-06-15 14:59:30 UTC
not problems converting this with 3.6 on Linux

luuk@opensuse:~> time soffice --headless --convert-to pdf --strace --outdir /home/luuk/temp /home/luuk/temp/tableTestBroken.html
convert /home/luuk/temp/tableTestBroken.html -> /home/luuk/temp/tableTestBroken.pdf using writer_web_pdf_Export

real    0m11.793s
user    0m10.520s
sys     0m0.932s
luuk@opensuse:~> ll temp/tableTestBroken.*
-rwxr--r-- 1 luuk users 605081 jun 15 16:47 temp/tableTestBroken.html
-rw-r--r-- 1 luuk users 326368 jun 15 16:58 temp/tableTestBroken.pdf
luuk@opensuse:~> soffice --version
LibreOffice 3.6
Comment 11 Luuk 2014-06-15 15:02:22 UTC
Created attachment 101099 [details]
strace.log of tableTestBroken.html with LO3.6
Comment 12 James 2014-06-16 13:46:09 UTC
I've just re-tested with LibreOffice 420m0(Build:3), on Ubuntu 12.04.04, and the issue remains.
Comment 13 retired 2014-07-25 09:52:27 UTC
Does 4.3RC3 improve anything?
4.3RC3: http://www.libreoffice.org/download/pre-releases/

Also Ubuntu 14.04 LTS is out and might be worth being tested.
Comment 14 Owen Genat (retired) 2014-07-28 08:19:27 UTC
(In reply to comment #13)
> Does 4.3RC3 improve anything?

I have tested attachment 80391 [details] under GNU/Linux x86_64 (Debian 7 / Crunchbang 11) using:

- v3.5.7.2 Build ID: 3215f89-f603614-ab984f2-7348103-1225a5b
- v4.1.6.2 Build ID: 40ff705089295be5be0aae9b15123f687c05b0a
- v4.2.5.2 Build ID: 61cb170a04bb1f12e77c884eab9192be736ec5f5
- v4.3.0.3 Build ID: 08ebe52789a201dd7d38ef653ef7a48925e7f9f7
- v4.4.0.0.alpha0+ Build ID: 4aa9b041de3129f19b48e66d349f48657b73f33e (2014-07-19)

All versions apart from v4162 (oddly) hang as described - v4162 produces valid PDF output. I can provide straces if anyone feels it will add anything further to this report. Status set to NEW.
Comment 15 QA Administrators 2015-09-04 02:50:02 UTC Comment hidden (obsolete)
Comment 16 QA Administrators 2016-09-20 10:30:01 UTC Comment hidden (obsolete)
Comment 17 Roman Kuznetsov 2018-12-09 20:30:47 UTC
Created attachment 147407 [details]
PDF from LO 6.2 beta 1
Comment 18 Roman Kuznetsov 2018-12-09 20:31:58 UTC
in LO 6.2 beta 1 PDF can be craeted fast and result is fine