Created attachment 203897 [details] PDF that does not appear properly in LibreOffice Some characters in a PDF overlap. Don't see a clear pattern, might be related to latvian special characters “ā,ē,ū,š,ģ,ķ,ļ,ž,č,ņ“. The problematic PDF is attached. It was exported by MS Word and Windows and appears fine on all other PDF viewers I have tried. LibreOffice was installed via Discover on Kubuntu 24.04 from Flathub. No settings were modified.
Created attachment 203898 [details] Comparison between LibreOffice (top) and Brave browser (bottom)
The five "fonts" recorded into the PDF are unusable. CIDFont+F1 CIDFont+F2 CIDFont+F3 CIDFont+F4 CIDFont+F5 Majority of the text runs are in the CIDFont+F4. IIANM the Character ID (CID) subset fonts have no meaning to LibreOffice's use of Poppler --> Cairo import filter object conversion to sd text strings. So an *unmanaged font fall back* occurs for each unknown font. Currently you can use the Tools -> Options -> Font dialog to substitute a system font of your choosing. I used the serif XITS font (for its excellent font metrics), but you probably want a sanserif. While if you have the source OOXML on you system you could check and assign LibreOffice font substitution to use the exact font of the document. Also, seems one can force the MS Word to PDF export to use its "Best for printing" mode that should embed the actual font rather than CID subsets. @Dave, any insight on the poppler --> Cairo handling of CID subsets in our pdfio import filters? Is this another aspect of our poppler builds not including font support?
@fpy, font subsets in the see also bug 169174 are not CID Rather they are dvips rendered as PS type 1, would think those would be near hopeless to try to sort out, and not really the issue here.
(In reply to V Stuart Foote from comment #3) > @fpy, font subsets in the see also bug 169174 are not CID > > Rather they are dvips rendered as PS type 1, would think those would be near > hopeless to try to sort out, so, WONTFIX ? > and not really the issue here. ok. not DUPLICATE
(In reply to V Stuart Foote from comment #3) > @fpy, font subsets in the see also bug 169174 are not CID > > Rather they are dvips rendered as PS type 1, would think those would be near > hopeless to try to sort out, and not really the issue here. If I understand correctly, the pdf does have character widths encoded in it, 6 0 obj << /Type /Font /BaseFont /BEAPJB+CMSS17 .... /Widths [ 625 313 313 313 313 313 313 313 313 313 ... so I guess we could do something to find the best metch based on that. (Maybe even just choseing something no-larger would work to avoid overlaps). PDF does weird font stuff all the time because of tending to include them, but also because it's setup for doing subsets of fonts by the way it encodes text.