Description: Horizontal text layout is incorrect in some imported pdfs: text overlaps when it shouldn't. Not all pdfs are affected, not all text in the example pdfs is affected. The text layout of the example documents is correct in MacOS Preview and in Firefox. Steps to Reproduce: 1. Open one of the sample pdfs 2. Examine the layout of the text. 3. Observe that the horizontal layout of the text is incorrect: some text overlaps Actual Results: Text is rendered with overlapping letters. Expected Results: Text is rendered without overlapping letters. Reproducible: Always User Profile Reset: No Additional Info: This bug affects LibreOffice_25.2.7, LibreOffice_25.8.3.2, and LibreOfficeDev_26.2.0.0.beta1. I haven't tried earlier versions.
Created attachment 204590 [details] Sample pdf document
Created attachment 204591 [details] Sample pdf document 2
These layout problems happen when the font used in the PDF is not available on the system and font fall back occurs. Fonts as subset into a PDF are not/can not reliably be used by LibreOffice (bug 101220). If you need the text spans from a PDF for some reason, then the Draw (or Impress, or Writer) filters (poppler and cairo project based) will convert them from the PDF into drawing text box shapes. So, for both the attached test documents, use the LibreOffice Tools -> Options -> Fonts dialog and assign both GaramondThree and AGaramondPro font to be replaced with simply Garamond. That reduces the overlaps to a reasonable amount. Unfortunately identifying the embedded fonts that need replacement is an extra step (filter opening the PDF and then reviewing the font reported in the properties panel for a selection of text). But it persists in user profile (affecting import of subsequent PDF). If you need pixel perfect fidelity of a PDF, break the PDF apart and insert each page as an image (it uses a different filter path, pdfium based). *** This bug has been marked as a duplicate of bug 165396 ***
Hi Stuart! Thanks for the info. I can confirm that manually adding the font substitution tables you suggested significantly ameliorates the problem. For anyone reading, on a Mac, the steps are: LibreOffice -> Preferences -> Fonts -> (check “Apply replacement table”, enter values for “Font” and “Replace with”, then check “Always”) I've attached a screenshot before (Screenshot.jpg) and post-font substitution (Screenshot1-post-font-sub.jpg). However, I wonder if font substitution is the whole story. I'll also attach screenshots (Selection-1.jpg and Selection-2.jpg) showing that the affected line is imported as two different blocks of text. The horizontal placement of the two text blocks is causing the overlap in Screenshot1. If all the text in that line of the paragraph had been placed in the same block, presumably it would have been more legible even without manually substituting the font.
Created attachment 204600 [details] Screenshot before font subsitution
Created attachment 204601 [details] Screenshot after font subsitution
Created attachment 204602 [details] Selection of one block of text in the affected line
Created attachment 204603 [details] Selection of the other block of text in the affected line
(In reply to wlmcderm from comment #4) > However, I wonder if font substitution is the whole story. I'll also attach > screenshots (Selection-1.jpg and Selection-2.jpg) showing that the affected > line is imported as two different blocks of text. The horizontal placement > of the two text blocks is causing the overlap in Screenshot1. If all the > text in that line of the paragraph had been placed in the same block, > presumably it would have been more legible even without manually > substituting the font. Manifestation of the Internal structure of published PDF. The text elements are laid down with no syntactical detail nor "sense" of their relation to other text elements--just their finished published presentation on the document page. The text elements are laid down between /BT and /ET flags. The text element strings are positioned accurately between those tags with horizontal positioning measures. Glyphs of the font(s) used although subset are recorded into the PDF, and as the poppler based filter can not read those glyphs they must be substituted. We can explicitly substitute the font with the 'Replacement Table' as noted, or simply trust to the poppler <--> cairo fallback and object creation, but not use the embedded glyphs. So the remaining overlap is bcz the /BT /ET text element metrics differ with the glyphs from the replacement font. The ending text of the first extends over the beginning text of the next. It can go the other direction, and you can end up with gaps rather than overlaps between adjacent text elements. The alternative to "Opening" the PDF and using the pdfium based Insert filter always directly reads the internal layout of the PDF and the embedded subset font. So if you need fidelity, break the PDF into its pages externally, and then insert as image. Image resolution can be controoled by setting a system variable PDFIMPORT_RESOLUTION_DPI, default is 96. 300 or 450 works well for full page rendering when placed onto an ODF document page. YMMV depending on need. And there are enhancment requests to improve handling the insert process (e.g. page range selection, resolution, rotation, etc.).
Oh, should also mention that for any single PDF being filter imported you may have multiple fonts defined for its text elements. A single glyph can be assigned a new font in its own /BT /ET element. Where you have additional overlaps, use the Sidebar Properties deck to select the residual overlapping texts to identify any additional fonts that may need to be substituted. Also, remember that using the font replacement table does not remove the original PDFs fonts assigned to the text elements, it just substitutes when rendered to LibreOffice document canvas. Kind of convoluted and you would need to copy paste to new clean document page to recreate the PDF in an "editable" form. LibreOffice is not a PDF editor, and PDFs are non-editable final published document.