Bug 113124 - Writer's PDF importer doesnt import correct font name
Summary: Writer's PDF importer doesnt import correct font name
Status: RESOLVED DUPLICATE of bug 143095
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.7.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF-Import-Writer
  Show dependency treegraph
 
Reported: 2017-10-14 19:47 UTC by Yousuf Philips (jay) (retired)
Modified: 2021-06-29 15:34 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yousuf Philips (jay) (retired) 2017-10-14 19:47:34 UTC
steps:
1. open attachment 136975 [details] with 'pdf (writer)' filter in file open dialog
2. notice that text is all formatted in the default CTL font (mine is Lucida Sans)
3. close pdf
4. open pdf with its default filter in Draw
5. notice the first line is in Traditional Arabic, second is in Frank Ruehl CLM and third is in Simplified Arabic

The original odt is attachment 136974 [details].

Version: 6.0.0.0.alpha0+
Build ID: 3672cdd35985201ea87463cf032fedd02c052f4d
CPU threads: 2; OS: Linux 4.4; UI render: default; VCL: gtk2; 
Locale: en-US (en_US.UTF-8); Calc: group
Comment 1 Dieter 2017-10-15 05:29:37 UTC
Reproducible for me

Version: 6.0.0.0.alpha0+ (x64)
Build ID: 465092047d5fa6ec6dd369372e712d76554570ff
CPU threads: 4; OS: Windows 6.19; UI render: GL; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2017-09-26_23:16:01
Locale: de-DE (de_DE); Calc: group
Comment 2 QA Administrators 2018-10-16 02:50:55 UTC Comment hidden (obsolete)
Comment 3 Dieter 2018-10-20 06:52:14 UTC
Still reproducible with

Version: 6.2.0.0.alpha0+ (x64)
Build ID: 48cfa0b00b22f11ade53aec79b2fdddad253e1bd
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-10-03_02:01:42
Locale: en-US (de_DE); Calc: CL
Comment 4 QA Administrators 2019-10-21 02:29:30 UTC Comment hidden (obsolete)
Comment 5 Buovjaga 2020-11-11 18:58:33 UTC
Still confirmed

Arch Linux 64-bit
Version: 7.1.0.0.alpha1+
Build ID: 3d3180115be3b87e76189aea2031f0caa735dbb3
CPU threads: 8; OS: Linux 5.9; UI render: default; VCL: kf5
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 11 November 2020
Comment 6 Michael Warner 2021-06-23 03:30:35 UTC
Repro:
Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 736e100c516ed5326f4cccd6d22205264df51914
CPU threads: 12; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: CL
Comment 7 Kevin Suo 2021-06-29 14:18:48 UTC
Where can I find the 'pdf (writer)' filter in file open dialog?
Comment 8 Buovjaga 2021-06-29 14:35:40 UTC
(In reply to Kevin Suo from comment #7)
> Where can I find the 'pdf (writer)' filter in file open dialog?

I checked and you have to enable experimental features to see it
Comment 9 V Stuart Foote 2021-06-29 14:37:29 UTC
(In reply to Kevin Suo from comment #7)
> Where can I find the 'pdf (writer)' filter in file open dialog?

The list is organized by module. Writer is the second block and the import filter is listed as 'PDF - Portable Document Format (Writer) (*.pdf)'  That will force the import into Writer rather than the default Draw.


Not experimental on Windows builds...
Comment 10 Kevin Suo 2021-06-29 15:33:35 UTC
OK, I see, this is exactly the same issue as reported in bug 143095. 

The test document in this bug uses "Traditional Arabic", "Frank Ruehl CLM" and "Simplified Arabic" fonts. All these fonts are of the names without spaces in the PDF file (i.e., TraditionalArabic, FrankRuehlCLM-Medium and SimplifiedArabic, which can be observed from a pdf viewer).

In Draw PDF import and Writer PDF import, this font names are read from the PDF files. The read-in names are then have their suffixes removed, but still without spaces, and are then used directly to build-up an ODF xml stream and then used for rendering.

Draw and Writer PDF import seems to use the same pre-process codes located in sdext/source/pdfimport, but they do have separate "treevisiting" codes: sdext/source/tree/drawtreevisiting.cxx and sdext/source/tree/writertreevisiting.cxx. However, the set if familyName is the same.

See the following code in line 830 in the drawtreevisiting.cxx file:
    // family name
    aFontProps[ "fo:font-family" ] = rFont.familyName;
    aFontProps[ "style:font-family-complex" ] = rFont.familyName;

and the following in line 902 of the writertreevisiting.cxx file:
    // family name
    aFontProps[ "fo:font-family" ] = rFont.familyName;
Comment 11 Kevin Suo 2021-06-29 15:34:35 UTC

*** This bug has been marked as a duplicate of bug 143095 ***