Bug 81484 - Draw pdf import: bold font shown as "outline", if Chinese (CJK?) font is applied
Summary: Draw pdf import: bold font shown as "outline", if Chinese (CJK?) font is applied
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium major
Assignee: Kevin Suo
URL:
Whiteboard: target:7.3.0
Keywords:
: 104725 104748 (view as bug list)
Depends on:
Blocks: CJK PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2014-07-18 07:47 UTC by Kevin Suo
Modified: 2021-07-16 05:25 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
test odt file, to be exported as PDF (15.43 KB, application/vnd.oasis.opendocument.text)
2014-07-18 07:48 UTC, Kevin Suo
Details
pdf file, exported from the above odt file (33.31 KB, application/pdf)
2014-07-18 07:48 UTC, Kevin Suo
Details
screenshot showing the difference (202.60 KB, application/pdf)
2014-07-18 07:56 UTC, Kevin Suo
Details
Screnshot from libreofficechina forum, which can confirm this issue (199.56 KB, image/jpeg)
2014-07-18 08:04 UTC, Kevin Suo
Details
Fill and stroke properties of the text (29.90 KB, application/pdf)
2014-09-14 12:32 UTC, vvort
Details
Sample pdf file with the same problem (423.29 KB, application/pdf)
2018-02-09 00:15 UTC, Franklin Weng
Details
text render modes from pdf specs (308.87 KB, image/jpeg)
2021-07-07 15:36 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2014-07-18 07:47:40 UTC
Steps to observe the problem:

1. Export the attached odt file as PDF;
2. Open with LibreOffice Draw

Current behaviour:
For the chars which are bold & Chinese font was applied, it will show as "outline" affect.
English bold chars are shown correctly.
Unbold chars are shown correctly, no matter which font was applied.

See the attached file for better understanding.

Version:
4.2.6.1
Build ID: 5fdddf655fba363e34f755715238d0943a44857e
Windows XP SP3

Also reproducible in
Version: 4.4.0.0.alpha0+
Build ID: 461e46904ffa29820be1ccb64cdb9cb6d4469b6c
TinderBox: Win-x86@39, Branch:master, Time: 2014-07-17_08:23:11
Comment 1 Kevin Suo 2014-07-18 07:48:15 UTC
Created attachment 103017 [details]
test odt file, to be exported as PDF
Comment 2 Kevin Suo 2014-07-18 07:48:43 UTC
Created attachment 103018 [details]
pdf file, exported from the above odt file
Comment 3 Kevin Suo 2014-07-18 07:56:35 UTC
Created attachment 103020 [details]
screenshot showing the difference
Comment 4 Kevin Suo 2014-07-18 08:04:36 UTC
Created attachment 103021 [details]
Screnshot from libreofficechina forum, which can confirm this issue

This bug was originally reported here:
http://libreofficechina.org/thread-163-1-1.html

I am attaching the original bug reporter's screenshot.
Comment 5 Kevin Suo 2014-07-18 08:12:32 UTC
Set to NEW, as it's already confirmed by a few poeple in libreofficechina forum.
Comment 6 Kevin Suo 2014-07-20 09:53:10 UTC
Also reproduced with 3.6.7.2, ubuntu 14.04 x86.
Comment 7 yanjingtao 2014-07-21 01:19:29 UTC
Reproduced with Fedora 20 x86_64 LO 4.2.5.2
Comment 8 vvort 2014-09-14 12:32:33 UTC
Created attachment 106256 [details]
Fill and stroke properties of the text

To correctly fix this problem, separate control over the fill and stroke properties of the text is needed.
For now, it is impossible to replicate first three samples of 'example_026.pdf' in LO.Draw without any hacks (like one text on top of other, conversion of text into polygons, etc).
Comment 9 QA Administrators 2015-10-14 19:58:06 UTC Comment hidden (obsolete)
Comment 10 Kevin Suo 2015-10-15 01:19:49 UTC
(In reply to QA Administrators from comment #9)

Bug still exists
Version: 5.0.3.1
Build ID: fd8cfc22f7f58033351fcb8a83b92acbadb0749e
Locale: zh-CN (zh_CN)
Win10 X86
Comment 11 Heiko Tietze 2016-05-10 08:48:11 UTC
Confirmed (outline is checked after reading the PDF but only for SimSum font and not Times New Roman where style is bold, as expected)

Version: 5.2.0.0.alpha0+
Build ID: 6b232aeecc55f1715bc111e636e36a8e24827efb
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@39, Branch:master, Time: 2016-01-26_07:40:04
Locale: de-DE (de_DE)
Comment 12 Buovjaga 2016-12-18 19:06:04 UTC
*** Bug 104748 has been marked as a duplicate of this bug. ***
Comment 13 Buovjaga 2016-12-18 19:07:04 UTC
*** Bug 104725 has been marked as a duplicate of this bug. ***
Comment 14 Buovjaga 2016-12-18 19:15:15 UTC
Not sure, if we should mark this as CJK, as I am able to repro with random fonts such as Lilita One (part of Google fonts pack). http://www.1001fonts.com/lilita-one-font.html
Comment 15 Telesto 2016-12-19 15:36:17 UTC
Also found in:
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 16 QA Administrators 2018-02-08 03:36:01 UTC Comment hidden (obsolete)
Comment 17 Franklin Weng 2018-02-09 00:15:45 UTC
Created attachment 139712 [details]
Sample pdf file with the same problem

1. The problem still exists in 6.0.0.2.

版本:6.0.0.2
組建 ID:06b618bb6f431d27fd2def25aa19c833e29b61cd
CPU 執行緒:4; OS:Linux 4.11; UI 算繪:預設; VCL: kde4; 
語言地區:zh-TW (zh_TW.UTF-8); Calc: group

2. The problem exists in 4.0.0.1, which is the earliest version I can test.
版本 4.0.0.1 (組建 ID:527dba6f6e0cfbbc71bd6e7b88a52699bb48799)

3. If I open it with Draw, save as ODG, and use zip to extract this ODG file into a folder, then use a text editor to modify *all* the 

style:text-outline="true"

to

style:text-outline="false"

and zip back to become odg file, the problem would be resolved.

It should be misjudged some fonts as text-outline=true.
Comment 18 QA Administrators 2019-02-10 03:44:16 UTC Comment hidden (obsolete)
Comment 19 Kevin Suo 2019-02-10 11:10:56 UTC
The problem still exists in the moat recent version.
Comment 20 Franklin Weng 2019-02-10 13:47:14 UTC
(In reply to QA Administrators from comment #18)

Still reproducible in

版本:6.2.0.3
組建 ID:98c6a8a1c6c7b144ce3cc729e34964b47ce25d62
CPU 執行緒:4; OS:Linux 4.15; UI 算繪:預設; VCL: gtk3; 
語言地區:zh-TW (zh_TW.UTF-8); UI語言:zh-TW
Calc: threaded
Comment 21 himajin100000 2020-07-25 23:56:00 UTC
cross reference

confirmed in Japanese forum.
https://ask.libreoffice.org/ja/question/256437/
Comment 22 Kevin Suo 2021-06-23 18:02:20 UTC Comment hidden (obsolete)
Comment 23 Kevin Suo 2021-06-26 15:12:39 UTC
After some debugging in gdb, I find that the related code seems to be in:
sdext/source/pdfimport/tree/pdfiprocessor.cxx

where the line:
> aChangedFont.isOutline = ( (rGC.TextRenderMode == 1) || (rGC. TextRenderMode == 2) );
seems to be wrong.

I am not sure what TextRenderMode == 1 and TextRenderMode == 2 means, but:

when I set this line to:
> aChangedFont.isOutline = ( rGC.TextRenderMode == 2 );
then the imported font is shown as "outline" effect

and when I set this line to:
> aChangedFont.isOutline = ( rGC.TextRenderMode == 1 );
then the outline effect is gone, but the font is not shown as bold.

As a result, I guess:
rGC.TextRenderMode == 1 means the font weight should be outlined, and
rGC.TextRenderMode == 2 means the font render should be "bold".

So, if
aChangedFont.isOutline = ( rGC.TextRenderMode == 2 )
then because the font is bold thus it evaluates to true, thus the outline effect is wrongly set.
The correct code is to use mode 2 as outline, while separately add lines to test if it is bold and set font weight accordingly.

Could someone have a look? It is out of my ability to fix this.
Comment 24 Kevin Suo 2021-06-27 09:07:57 UTC
Furher investigation shows that the text rendering mode is already 2 when I precessing the pdf using instdir/program/xpdfimport binary file. My understanfing is that that binary uses poppler to parse the pdf, and then libreogfice uses the parsed result to do further rendering line by line (in https://opengrok.libreoffice.org/s?refs=parseLine&project=core).

The test pdf file uses SimSin font. This font does not have a "bold" font name. I guess in pdf it may used text render mode for "fake bold".
Comment 25 Kevin Suo 2021-06-27 13:19:30 UTC
Text Render Mode codes are explained in section 5.2.5 here:
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf
Comment 26 Kevin Suo 2021-06-30 10:25:11 UTC
I have submitted a patch in gerrit:
https://gerrit.libreoffice.org/c/core/+/118156

Please review and test and provide any feedback with this patch.
Comment 27 vvort 2021-06-30 13:56:37 UTC
So in case of really outlined text in PDF it will be converted to bold?
Comment 28 Kevin Suo 2021-07-01 01:03:33 UTC
(In reply to vvort from comment #27)
Yes, I think that is the only way to make fake bold showing as bold. Do you have any other suggestion?

SimSun font is for Chinese font (in Windows) and based on my experience of 20 years playing with computers no one is using the "outline" effect for Chinese characters. In contrast, "bold" Chinese characters using SimSun font is very ofen, almost every Chinese user uses this style everyday in every pdf document.
Comment 29 Kevin Suo 2021-07-07 15:31:42 UTC
It seems that the "fill+stroke" is set by the following code:
https://opengrok.libreoffice.org/s?defs=mbFauxBold&project=core

The member variable is called mbFauxBold. This variable is set to true in the following situation:
https://opengrok.libreoffice.org/xref/core/vcl/quartz/ctfonts.cxx?r=7a83d0a2#85

In pdf, fill+stroke is text render mode "2". At the time of PDF import, Draw detects this text render mode, but it may be difficult to know whether is real "fill+stroke" or is it Faux Bild. Anyway, it is wrong to treat Text Render Mode = 2 as "outline" text because they look quite different.

Text Render Mode 2 strokes the text with a thin line, and then fills the text with the font color. It makes more sense to treat it is bold in pdf import.

Text Render Mode 1 is stroking the text but not fill, which looks more like  "outline" text.

As a result I think a better solution is to map all Text Render Mode 2 as bold in Draw PDF import.
Comment 30 Kevin Suo 2021-07-07 15:36:52 UTC
Created attachment 173418 [details]
text render modes from pdf specs

In this screenshot the illustration uses different colors for the stroke line and the fill color. For Faux Bold, these two colors are the same (e.g. black).
Comment 31 Kevin Suo 2021-07-09 04:28:39 UTC
(In reply to vvort from comment #27)
> So in case of really outlined text in PDF it will be converted to bold?

I have updated the patch which now reserves the real "outline" character formats. Would you please review and test, thanks.

The logic is, for faux bold (fake bold), the Fill Color and the Stroke Color for the text are the same (e.g., black), while for real "outline" characters the Fill Color and the Stroke Color are different.
Comment 32 Commit Notification 2021-07-14 07:08:38 UTC
Kevin Suo committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fe28633ee6edc5986220c934dfb04aa7b0d065ad

tdf81484 Draw and Writer pdf import: SimSun bold font is shown as "outline"

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 33 Kevin Suo 2021-07-14 17:10:00 UTC
Kevin Suo committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/12b57e43563a643dd653d78f3e2877ef75998d82

tdf#78427 tdf#81484 sdext.pdfimport: added unittest

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 34 Kevin Suo 2021-07-16 05:25:13 UTC
Fixed on master. Please test.