Bug 154789 - EMF+ graphic causes hang / catastrophic memory leak
Summary: EMF+ graphic causes hang / catastrophic memory leak
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.4.0.3 release
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Bartosz
URL:
Whiteboard: target:7.6.0 target:7.5.3.2 target:7....
Keywords: bibisected, bisected, regression
Depends on:
Blocks: EMF-WMF
  Show dependency treegraph
 
Reported: 2023-04-13 12:57 UTC by Naresh
Modified: 2023-05-08 13:51 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
gdb trace (15.35 KB, text/plain)
2023-04-13 12:58 UTC, Naresh
Details
problematic document (10.69 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-04-17 12:37 UTC, Naresh
Details
Extracted EMF+ image which is causing performance issues (7.34 MB, application/vnd.rar)
2023-04-21 18:01 UTC, Bartosz
Details
Screeshot of problematic docx after opening (101.26 KB, image/png)
2023-04-21 19:54 UTC, Bartosz
Details
to improve the performance (3.50 MB, application/x-zip-compressed)
2023-05-08 07:48 UTC, Naresh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Naresh 2023-04-13 12:57:27 UTC
Description:
Hello,

We have an application where we store microsoft office documents. As part of the release management we convert the office documents to pdf with watermarks using libreoffice.

Scenario:

Almost all documents are coverted to pdf without any problem.
Our observation shows that some are getting failed with heavy memory consumption.
Libreoffice crashes when the documents contains heavy images and more pages (Ex: around 600 - 1000 pages)
I have tested both CLI and GUI. Result is same.

Technical Stack:

Redhat Enterprise 7.9 (3.10.0-1160.83.1.el7.x86_64)

Tested multiple versions: Result is not ok.
LibreOffice 7.1.5.2
LibreOffice 7.4.6.2
LibreOffice 7.5.1.2

CLI Command:
/opt/libreoffice75/program/soffice --headless --convert-to “pdf:writer_pdf_Export” --outdir /tmp 7090190.docx

Size of docx : 100MB

Appreciate some pointers to solve this problem.

Steps to Reproduce:
CLI Command:
/opt/libreoffice75/program/soffice --headless --convert-to “pdf:writer_pdf_Export” --outdir /tmp 7090190.docx

Actual Results:
Crashed with high memory consumption

Expected Results:
pdf generated.


Reproducible: Always


User Profile Reset: No

Additional Info:
gdb trace attached
Comment 1 Naresh 2023-04-13 12:58:03 UTC
Created attachment 186638 [details]
gdb trace
Comment 2 Stéphane Guillou (stragu) 2023-04-13 21:33:45 UTC
I wasn't able to crash LO using that filter with:

Version: 7.5.2.2 (X86_64) / LibreOffice Community
Build ID: 53bb9681a964705cf672590721dbc85eb4d0c3a2
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Please provide an example document to test, preferably smaller than 100mb and sanitised if needed.
Comment 3 Stéphane Guillou (stragu) 2023-04-13 21:37:27 UTC
To go page-by-page to pinpoint the offending data, you could use a bash script with extra filter options, e.g. this inside a loop that changes the page number:

libreoffice7.5 --headless --convert-to 'pdf:writer_pdf_Export:{"PageRange":{"type":"string","value":"2"}}' large_file.docx
Comment 4 Naresh 2023-04-17 12:34:32 UTC
Hi Stéphane,

Thanks for your reply.

Able to find the one pager that contains one image which is causing this problem. Attaching here for your reference and testing. 

Please let me know if you need more information on this. Thank you.
Comment 5 Naresh 2023-04-17 12:37:00 UTC
Created attachment 186725 [details]
problematic document
Comment 6 Buovjaga 2023-04-21 13:36:03 UTC
The issue is about opening the file. When using a debug build, I see lots of EMF+ warnings.

I bibisected with linux-64-5.4 repo and got a range of four commits. These stand out:
 2e7c94f5054dec4ab19c44209136c886793f0acb
tdf#107034 EMF+ Add support for import EmfPlusDrawPie record
 9b693d896bf9a08cd8987e483f5269d6f2be1fd3
tdf#107019 EMF+ Add support for import EmfPlusRecordTypeDrawBeziers record
Comment 7 V Stuart Foote 2023-04-21 15:19:21 UTC
Extracting the EMF+ from media folder of attachment 186725 [details] is a rather large 47MB image of some complexity. Attached to WinDbg session, LO will eventually open it with image rendered to canvas but LO's memory use does grow to about 6.5GB as the EMF is parsed. Once open the Draw UI is rather sluggish.

Version: 7.5.2.2 (X86_64) / LibreOffice Community
Build ID: 53bb9681a964705cf672590721dbc85eb4d0c3a2
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Opening with Draw crashed with skia output device rendering, seems a different issue.

GDI only default rendering slowly parses to completion rendering to document canvas--a rather pixelated image. GDI object counts remain low so no leakage there. 

It is just a bad EMF?
Comment 8 Bartosz 2023-04-21 18:01:37 UTC
Created attachment 186845 [details]
Extracted EMF+ image which is causing performance issues
Comment 9 Bartosz 2023-04-21 19:54:23 UTC
Created attachment 186847 [details]
Screeshot of problematic docx after opening

The image contains total 597009 records, which most many of them are Bezier curves (EMF+ EmfPlusRecordTypeDrawBeziers (0x4019)).


These curves are defined by few points, but then it is translated to hundreds of individual lines:
https://en.wikipedia.org/wiki/B%C3%A9zier_curve

To workaround we could try to disable EMF+ drawing (leave only EMF). It could be done by setting environment variable (Unfortunately I don't remember what was the name of it).
Comment 10 Bartosz 2023-04-21 20:01:38 UTC
To disable EMF+, you could try setup environment variable:

export EMF_PLUS_DISABLE=true

It is much faster after enabling it.
Comment 11 Bartosz 2023-04-21 20:08:51 UTC
The main difference between EMF and EMF+ is that EMF is operating on integers (sal_Int32) and EMF+ is operating on Floating-point numbers (double, float)
Comment 12 V Stuart Foote 2023-04-22 17:17:33 UTC
confirm setting EMF_PLUS_DISABLE environment variable true tames the behavior in UI.  Memory use drops from ~6.5GB to ~400MB and the document canvas can actually be worked with.

Suppose the same can be set for command line conversion as OP needs.

Question though is if we can read image meta (get a count of curves before processing) and avoid EMF+ parsing when over some threshold.  Or maybe some sort of timer based release and fall back to simple EMF when counts grow to high?
Comment 13 Bartosz 2023-04-23 01:05:59 UTC
The performance improvement for Bezier curves are already created here:
https://gerrit.libreoffice.org/c/core/+/150821

Please take a look and check how it is working for you.
Comment 14 Commit Notification 2023-04-23 15:43:24 UTC
Bartosz Kosiorek committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ce008fa9d8f2752bdfeaeff763aafc774a4b4fb2

tdf#154789 EMF+ Performance boost of the EmfPlusRecordTypeDrawBeziers

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2023-04-24 07:59:58 UTC
Bartosz Kosiorek committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/1328e2b7eb5251162834d7c0f953c6334686e95e

tdf#154789 EMF+ Performance boost of the EmfPlusRecordTypeDrawBeziers

It will be available in 7.5.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Naresh 2023-04-24 13:29:34 UTC
Thank you very much for your support and effort in fixing this problem quickly.

Is it possible to add this fix to 7.4 version also ?
Comment 17 Buovjaga 2023-04-24 13:38:59 UTC
(In reply to Naresh from comment #16)
> Thank you very much for your support and effort in fixing this problem
> quickly.
> 
> Is it possible to add this fix to 7.4 version also ?

The timeline is a bit tight: https://wiki.documentfoundation.org/ReleasePlan/7.4#7.4.7_release
Comment 18 Commit Notification 2023-04-26 11:25:10 UTC
Bartosz Kosiorek committed a patch related to this issue.
It has been pushed to "libreoffice-7-5-3":

https://git.libreoffice.org/core/commit/b1ed265975407aea9eda568049be4d68301276af

tdf#154789 EMF+ Performance boost of the EmfPlusRecordTypeDrawBeziers

It will be available in 7.5.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2023-04-27 07:23:27 UTC
Bartosz Kosiorek committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/168dc9075d7be4d7da5f5e1ee602751f84dbd254

tdf#154789 EMF+ Performance boost of the EmfPlusRecordTypeDrawBeziers

It will be available in 7.4.8.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2023-04-27 19:41:24 UTC
Bartosz Kosiorek committed a patch related to this issue.
It has been pushed to "libreoffice-7-4-7":

https://git.libreoffice.org/core/commit/cd94594b24c48602a1eef6af8d98cbf5a6467e3a

tdf#154789 EMF+ Performance boost of the EmfPlusRecordTypeDrawBeziers

It will be available in 7.4.7.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Naresh 2023-04-28 14:07:34 UTC
Hello,

I have verified the fix in both 7.4.7 and 7.5.3 dev levels. Both are working good.

Convert is taking time (Around 8 mins) for a document of 700 pages. But still it is good for us than getting crashed.

Thanks a lot for your support and time.
Comment 22 Bartosz 2023-04-28 14:51:22 UTC
Thanks Naresh. Can you share some more documents so I could try to improve performance?
Comment 23 Naresh 2023-05-08 07:48:43 UTC
Created attachment 187133 [details]
to improve the performance

Hi,

Attaching a new test document to improve the performance. 

This has 663 pages which takes around 8-10mins to convert to PDF. If you can improve the performance, it will be great.
Comment 24 V Stuart Foote 2023-05-08 13:51:57 UTC
(In reply to Naresh from comment #23)
> Created attachment 187133 [details]
> to improve the performance
> 
> Hi,
> 
> Attaching a new test document to improve the performance. 
> 
> This has 663 pages which takes around 8-10mins to convert to PDF. If you can
> improve the performance, it will be great.

That MS Word binary .doc document does not contain an EMF+ image, so not the issue resolved here. 

Please submit a new BZ ticket, and reattach the document (ZIP archive is fine) but its source needs to be native ODF .odt, or at a minimum OOXML .docx