Bug 140005 - Draw import of PDF squishes text together
Summary: Draw import of PDF squishes text together
Status: RESOLVED DUPLICATE of bug 101220
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
7.1.0.1 rc
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-29 16:42 UTC by Randy
Modified: 2022-06-17 14:36 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Original PDF (58.35 KB, application/pdf)
2021-01-29 16:43 UTC, Randy
Details
Draw document of imported pdf (57.35 KB, application/vnd.oasis.opendocument.graphics)
2021-01-29 16:46 UTC, Randy
Details
empty "Font" box for Replacement table (68.22 KB, image/png)
2021-02-01 15:28 UTC, Randy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Randy 2021-01-29 16:42:17 UTC
Description:
When I import a certain style of PDF document (music chord sheet) to edit the text, the text is squished together.  Most of the time it manifests as just the loss of space between words, but sometimes words overlap.  

Steps to Reproduce:
1.Open a pdf in Draw
2.
3.

Actual Results:
Text squished and font not properly recognized.

Expected Results:
Text should have appeared as original


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
[Information automatically included from LibreOffice]
Locale: en-US
Module: DrawingDocument
[Information guessed from browser]
OS: Windows 10 Pro
OS is 64bit: yes
Comment 1 Randy 2021-01-29 16:43:51 UTC
Created attachment 169278 [details]
Original PDF

Notice how the text is spaced normally.
Comment 2 Randy 2021-01-29 16:46:51 UTC
Created attachment 169279 [details]
Draw document of imported pdf

Notice how the text is squished and in some places overlaps as well as incorrect font.
Comment 3 V Stuart Foote 2021-01-29 21:26:02 UTC
LibreOffice is not a PDF "editor", we filter import a PDF document either as a single image (pdfium libs) or to multiple draw objects (pdfio).

The single image insert provides high fidelity to original PDF, but renders to fixed resolution raster of the first page of the PDF--bug 115811.

The other import filter will render the entire PDF as individual pages of a Draw ODF document.  However, the rendering must extract the text runs of the PDF and assign a font to them--the embedded subset fonts are not very useful as they are subset--how would you edit the result with a partial font? Bug bug 101220 is open for that.

If you need to clean up your PDF after import--it needs a functional font. The Ghostscript provided 'Nimbus Sans L Bold' and 'Nimbus Sans L Regular' the wkhtmltopdf generator has used are difficult to install. Ghostscript does not install them as TTF system fonts. When the PDF is imported the fonts appear as uninstalled (the font name for a selected glyph is itallics)--for this document NimbusSansL and NimbusSansLu

Meaning to clean up your imported PDF pages, you'll need to assign a replacement font. Best if you can obtain the Nimbus Sans--but if not Helevetica or Arial/Arial Black are not too far off.

You can use the Tools -> Options -> Fonts 'Apply replacement table'. The Fonts replacement table can be reversed. While setting the 'Always' checkbox will use the replacements from the table on print or export.

*** This bug has been marked as a duplicate of bug 101220 ***
Comment 4 Randy 2021-01-29 22:13:46 UTC
(In reply to V Stuart Foote from comment #3)
> LibreOffice is not a PDF "editor", we filter import a PDF document either as
> a single image (pdfium libs) or to multiple draw objects (pdfio).
> 
> The single image insert provides high fidelity to original PDF, but renders
> to fixed resolution raster of the first page of the PDF--bug 115811.
> 
> The other import filter will render the entire PDF as individual pages of a
> Draw ODF document.  However, the rendering must extract the text runs of the
> PDF and assign a font to them--the embedded subset fonts are not very useful
> as they are subset--how would you edit the result with a partial font? Bug
> bug 101220 is open for that.
> 
> If you need to clean up your PDF after import--it needs a functional font.
> The Ghostscript provided 'Nimbus Sans L Bold' and 'Nimbus Sans L Regular'
> the wkhtmltopdf generator has used are difficult to install. Ghostscript
> does not install them as TTF system fonts. When the PDF is imported the
> fonts appear as uninstalled (the font name for a selected glyph is
> itallics)--for this document NimbusSansL and NimbusSansLu
> 
> Meaning to clean up your imported PDF pages, you'll need to assign a
> replacement font. Best if you can obtain the Nimbus Sans--but if not
> Helevetica or Arial/Arial Black are not too far off.
> 
> You can use the Tools -> Options -> Fonts 'Apply replacement table'. The
> Fonts replacement table can be reversed. While setting the 'Always' checkbox
> will use the replacements from the table on print or export.
> 
> *** This bug has been marked as a duplicate of bug 101220 ***

Thanks for the help.  One clarification.  I was able to get the Nimbus SansL font.  I followed your directions, but I wasn't sure what font to select as the font to be replaced (there is a "Font" box and a "Replacement Font" box).  I selected Nimbus SansL as the replacement font.
Comment 5 V Stuart Foote 2021-01-30 00:03:06 UTC
(In reply to Randy from comment #4)
> 
> Thanks for the help.  One clarification.  I was able to get the Nimbus SansL
> font.  I followed your directions, but I wasn't sure what font to select as
> the font to be replaced (there is a "Font" box and a "Replacement Font"
> box).  I selected Nimbus SansL as the replacement font.

You are replacing the bogus fontname of what the import filter has extracted from the PDF:

as in PDF runs      LODraw imports --> Replace with FontName (of your system)
NimbusSanL-Bold --> "NimbusSanL"      "Nimbus Sans L Bold"
NimbusSanL-Regu --> "NimbusSanLu"     "Nimbus Sans L Regular"
Comment 6 Randy 2021-02-01 15:25:38 UTC
The issue I am having is the LODraw is not showing a bogus font.  The box for the current font, the one that I should be replacing is empty.  I tried reimporting, no font.  I tried selecting all text, no font.  I tried selecting individual text, still no font displayed.  I will attach a screen shot.
Comment 7 Randy 2021-02-01 15:28:35 UTC
Created attachment 169355 [details]
empty "Font" box for Replacement table

Screenshot of view when trying to use the Replacement text tool
Comment 8 V Stuart Foote 2021-02-01 15:53:19 UTC
(In reply to Randy from comment #6)
> The issue I am having is the LODraw is not showing a bogus font.  The box
> for the current font, the one that I should be replacing is empty.  I tried
> reimporting, no font.  I tried selecting all text, no font.  I tried
> selecting individual text, still no font displayed.  I will attach a screen
> shot.

The font replacement is global to the LO GUI, not to the specific module. Applying the replacement is "preventative", for a PDF import you just cheat a little and need to identify the bogus font name.  That is, import the PDF and allow filter to determine the font is not available--rendering the bogus font name in italics. Position the text cursor onto a text run, the different fonts will show in the properties Sidebar deck. You can identify the fonts you will need to replace.

Then for each, just type the bogus name into the listbox for the target Font, and use the list box drop down to select an installed font.  Use the Check button to apply the replacment.
Comment 9 Randy 2021-02-01 21:22:37 UTC
Thanks.  I successfully replaced fonts.  Thank you very much!
Comment 10 V Stuart Foote 2021-02-01 22:22:40 UTC
(In reply to Randy from comment #9)
> Thanks.  I successfully replaced fonts.  Thank you very much!

OK, remember if you decide to export back out to PDF you will need to use the 'Always' checkbox for the font replacement. That will force the style to be rewritten and exported using the substituted font rather than the original bogus font name(s).
Comment 11 Randy 2021-02-02 19:46:23 UTC
Ok.  Thanks for the tip.
Comment 12 waffleklin 2022-06-16 04:04:12 UTC Comment hidden (spam)
Comment 13 Randy 2022-06-17 14:36:54 UTC
Tested with latest version and the pdf opened correctly.