Bug 143095 - Font handling for PDF import filters does not do enough to match the font PS names in the PDF against what often are locally installed fonts
Summary: Font handling for PDF import filters does not do enough to match the font PS ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 104264 113124 151247 (view as bug list)
Depends on:
Blocks: PDF-Import-Draw
  Show dependency treegraph
 
Reported: 2021-06-27 13:16 UTC by Kevin Suo
Modified: 2022-09-30 10:31 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
test odt file (8.75 KB, application/vnd.oasis.opendocument.text)
2021-06-27 13:16 UTC, Kevin Suo
Details
pdf file exported from the test odt file (21.46 KB, application/pdf)
2021-06-27 13:17 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2021-06-27 13:16:30 UTC
Created attachment 173232 [details]
test odt file

Steps to Reproduce:

1. Type "English Normal Liberation Serif" in Writer. 
The default font used should be "Liberation Serif" (note there is a space). Manually set to this font if needed.
2. Export to PDF.
3. Open the saved PDF file with Draw.

Expected Result:
The font for the characters "English Normal Liberation Serif" should be "Liberation Serif", with a space in the font name, so that the correct system font is used to render the content.

Current Result
The font is "LiberationSerif", without a space. As there is no such font, a fallback font is used in Draw for rendering.

Version: 7.1.5.0.0+ / LibreOffice Community
Build ID: 8619e743564a241eb951866616aec82e1ab3965f
CPU threads: 4; OS: Linux 5.12; UI render: default; VCL: gtk3
Locale: zh-CN (zh_CN.UTF-8); UI: zh-CN
Calc: threaded

Fedora 33 x64.
Comment 1 Kevin Suo 2021-06-27 13:17:03 UTC
Created attachment 173233 [details]
pdf file exported from the test odt file
Comment 2 V Stuart Foote 2021-06-27 18:54:30 UTC
Observation is valid

Version: 7.3.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 1db375e06516d0532f01f9585986617aa3079866
CPU threads: 8; OS: Windows 10.0 Build 19042; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

But is this a bug? I don't think so...

LibreOffice is not a PDF "editor", we filter import a PDF (to Draw by default, but also to Writer, or Imrpess).

While font names held within PDF (including our filter exports to PDF) are never the installed system font names. Rather they are the PDF parsed names for the font that may or maynot be embedded--with subset or fully into the PDF.

We should handle the subsetting better--bug 82163 or bug 101220, needed for touchup of a PDF--as other 'viewers' support. But since we do not 'edit' a PDF the font names are irrelevant--and correctly receive a fall back font assignment.

Withing LO filter import, the fallback font assignment could be improved--to better match what we exported into the PDF against the internal PDF subset fonts.  But that is an enhancement.

Our other import filter is pdfium based. Used for the Insert -> Image dialog, it recomposes the PDF page(s) using the internal font glyph paths to render the text runs described. If the fonts are not embedded it just uses the paths. Fidelity is high (though the bitmap rendering remains low resolution, bug 115811).
Comment 3 V Stuart Foote 2021-06-27 19:30:39 UTC
*** Bug 104264 has been marked as a duplicate of this bug. ***
Comment 4 Kevin Suo 2021-06-27 23:44:55 UTC
(In reply to V Stuart Foote from comment #2)
We changed the font to LiberationSerif on pdf export, so why not change it back to Liberation Serif on pdf import? I think a pdfToDrawFontMap can do this. We can convert the commonly known fonts back to the corrent names, and leave those unknown fonts "as is".

I understand Draw is not a pdf editor, but since we have the pdf import filter, we surely should make this filter work better.
Comment 5 Callegar 2021-06-28 07:56:50 UTC
At least on Linux, it should be possible to query the system font machinery for the font that actually uses that postscript name.
Comment 6 Kevin Suo 2021-06-29 12:57:52 UTC
See the function familyNameOverride in codes prior to the following commit;
https://cgit.freedesktop.org/libreoffice/core/commit/?id=abe4d7bd0a1ec4b5a31cc5622080952e4cd53ebf

This map was removed in that commit, but this is perfect to resolve this bug.

The related code for the setting of font family name is in the following line
https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/tree/drawtreevisiting.cxx?r=e6dfaf9f#830

At this moment, the family name as represened by;
aFontProps[ "fo:font-family" ] = rFont.familyName;
is the wrong name without space. 

This family name was passed here after been processed by LineParser::parseFontFamilyName in:
https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/wrapper/wrapper.cxx?r=12362fc4#511

After this line, the PDF font names is interpreted as the following and is then shown on Draw:
LiberationSerif (should be "Liberation Serif" in ODF XML stream)
TimesNewRommanBold (should be "Times New Roman" and the isBold should be true)
SimSun (should be Chinese name "宋体")
...
Comment 7 Kevin Suo 2021-06-29 15:34:35 UTC
*** Bug 113124 has been marked as a duplicate of this bug. ***
Comment 8 Kevin Suo 2021-06-29 15:37:14 UTC
Mark as New since there is at least one duplicates.

I did some debugging on this issue these days. I already found out how those related codes work, but I don't think I have the ability to revolve this issue.
Comment 10 Kevin Suo 2021-07-12 09:55:03 UTC
Just for those who are interested:

rResult.familyName, as returned by:
LineParser::parseFontFamilyName( FontAttributes& rResult )
in: sdext/source/pdfimport/wrapper/wrapper.cxx

is eventually passed to:

void DrawXmlFinalizer::visit( TextElement& elem, const std::list< std::unique_ptr<Element> >::const_iterator& )
in:sdext/source/pdfimport/tree/drawtreevisiting.cxx
and:
void WriterXmlFinalizer::visit( TextElement& elem, const std::list< std::unique_ptr<Element> >::const_iterator& )
in:sdext/source/pdfimport/tree/writertreevisiting.cxx

and eventually is written to an xml stream of ODF XML format and is then imported to Draw or Writer by
bool xpdf_ImportFromStream(...)
in: sdext/source/pdfimport/wrapper/wrapper.cxx

So, before it is written to the xml stream:
* In case rResult.familyName is "TimesNewRoman", it should be converted to "Times New Roman";
* In case rResult.familyName is "SimSun", it should be converted to the localized name "宋体" if the currently locale of the user is Chinese; and should be "SimSun" if the currently locale is not Chinese.
Comment 11 Kevin Suo 2022-09-30 10:31:50 UTC
*** Bug 151247 has been marked as a duplicate of this bug. ***