34212 – Accented Characters and Umlauts are missing with Type1 fonts

Bug 34212 - Accented Characters and Umlauts are missing with Type1 fonts

Summary: Accented Characters and Umlauts are missing with Type1 fonts

Status:	RESOLVED WONTFIX

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Printing and PDF export (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	x86 (IA32) Windows (All)

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Duplicates (2):	87932 94061 (view as bug list)
Depends on:
Blocks:	Fonts PDF-Export
	Show dependency tree / graph

Reported:	2011-02-12 09:19 UTC by astumpf
Modified:	2017-11-04 11:57 UTC (History)
CC List:	11 users (show)

See Also:	https://issues.apache.org/ooo/show_bug.cgi?id=63015 89246
Crash report or crash signature:

Attachments
exported pdf (41.67 KB, application/pdf) 2016-12-01 13:05 UTC, przekop	Details
odt sample (11.05 KB, application/vnd.oasis.opendocument.text) 2016-12-01 13:30 UTC, przekop	Details
export with master sources (24.88 KB, application/.pdf) 2016-12-01 19:02 UTC, Julien Nabet	Details
new export with filled fields (25.57 KB, application/.pdf) 2016-12-07 17:47 UTC, Julien Nabet	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description astumpf 2011-02-12 09:19:42 UTC

PDF Export shows accented characters and umlauts as blank, when Type1-Fonts are used. The problem seems to be, that the PDF doesn't include the definition of an encoding vector. Adding "/Encoding /WinAnsiEncoding" to the Font-Object could be a quick fix at least for west european characters. 

The error is known in Open Office and documented here: http://www.openoffice.org/issues/show_bug.cgi?id=63015

Comment 1 Björn Michaelsen 2011-12-23 11:50:41 UTC Comment hidden (obsolete)

[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1

more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html

Comment 2 astumpf 2011-12-23 13:50:59 UTC

I tested again with Release 3.5.0. beta and can confirm that the bug has not been resolved. A document with Umlauts in an Adobe Type1 font looses the umlauts when exported to PDF. Cross-Checking by Printing with Adobe's PDFwriter device is working fine. The significant difference in the generated PDF files is that the sequence "/Encoding /WinAnsiEncoding" in the font object is missing in the exported PDF file.

Comment 3 Harald Kliems 2013-10-31 19:49:51 UTC

It looks like the bug still exists in 4.1.2.3. Manually adding "/Encoding /WinAnsiEncoding" in the pdf file indeed is a workaround, but certainly no beginner-friendly one.

Comment 4 edv 2014-09-18 15:37:39 UTC

I found the solution to this. 
In vcl\source\gdi\pdfwriter_impl.cxx the Line 3494:
if( !pFont->IsSymbolFont() && pEncoding == 0)
must be changed to:
if( !pFont->IsSymbolFont() )

Reason: Without the pEncoding check - "/Encoding/WinAnsiEncoding\n" is added to the pdf file font object which is correct. pEncoding specifies that a ToUnicode stream has to be generated (and it is) and nothing speaks against it because it is only a translation table and doesn't affect the encoding itself. For symbolic fonts WinAnsiEncoding would be wrong because they have there own encoding shipped with.

I don't want to upload this myself because I don't intend to do more on libreoffice and it is to tiny to go through the git/gerrit upload process and making a patch for this. So please someone else do this, I don't want any rights on that code submission.

Comment 5 Frank Berke 2015-01-05 16:41:33 UTC

Looks like this has been remedied with LO 4.4.0.1 (which still is RC), while in 4.3.5 the bug is still present.

Comment 6 astumpf 2015-01-29 19:12:52 UTC

the bug is still present in 4.4.0.3

Comment 7 luiscastro193 2015-05-04 11:41:46 UTC

I confirm the bug is still present in 4.4.2 and at least the patch posted by edv isn´t applied neither in 4.4.3 source.

Comment 8 Julien Nabet 2015-09-10 20:55:06 UTC

(In reply to edv from comment #4)
> I found the solution to this. 
> In vcl\source\gdi\pdfwriter_impl.cxx the Line 3494:
> if( !pFont->IsSymbolFont() && pEncoding == 0)
> must be changed to:
> if( !pFont->IsSymbolFont() )
> 
> Reason: Without the pEncoding check - "/Encoding/WinAnsiEncoding\n" is added
> to the pdf file font object which is correct. pEncoding specifies that a
> ToUnicode stream has to be generated (and it is) and nothing speaks against
> it because it is only a translation table and doesn't affect the encoding
> itself. For symbolic fonts WinAnsiEncoding would be wrong because they have
> there own encoding shipped with.
> 
> I don't want to upload this myself because I don't intend to do more on
> libreoffice and it is to tiny to go through the git/gerrit upload process
> and making a patch for this. So please someone else do this, I don't want
> any rights on that code submission.

Just for information, the patch had been pushed with this:
http://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=eea16cb3e65a4308caddb7618d31a76ca259dbb1

but reverted with this:
http://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=297b22bd49ea11a90063ab8503fb83090f351668
(see reasons in commit if interested)

Comment 9 Julien Nabet 2015-09-10 20:55:36 UTC

*** Bug 87932 has been marked as a duplicate of this bug. ***

Comment 10 Julien Nabet 2015-09-10 20:56:53 UTC

*** Bug 94061 has been marked as a duplicate of this bug. ***

Comment 11 drunken monkey 2015-10-25 17:16:23 UTC

Also experiencing this in LO 5.0.2.2 under Arch Linux. The bug suddenly appeared a few months ago, before that everything was working fine. However, it's also dependent on the PDF Viewer used. The one under Windows and Firefox's built-in one display the umlauts (though they look a bit off), but Evince just shows blank spaces.

Comment 12 Tom Yan 2015-10-25 21:22:52 UTC

(In reply to drunken monkey from comment #11)
> Also experiencing this in LO 5.0.2.2 under Arch Linux. The bug suddenly
> appeared a few months ago, before that everything was working fine. However,
> it's also dependent on the PDF Viewer used. The one under Windows and
> Firefox's built-in one display the umlauts (though they look a bit off), but
> Evince just shows blank spaces.

Somehow the new "gsfonts" triggered this (again): https://bugs.documentfoundation.org/show_bug.cgi?id=95221

Comment 13 Tom Yan 2015-10-25 21:37:54 UTC

However in the case I experience it has nothing to do with "WinAnsiEncoding" at all. So I will not mark my bug report as duplicate. See the attached PDFs (with an text editor or so) in my bug report for details.

Comment 14 Gilbert Röhrbein 2016-10-13 17:22:38 UTC

This might be a possible fix, based on the comment in the revert commit.

diff --git a/vcl/source/gdi/pdfwriter_impl.cxx b/vcl/source/gdi/pdfwriter_impl.cxx
index 0d886e0..8755448 100644
--- a/vcl/source/gdi/pdfwriter_impl.cxx
+++ b/vcl/source/gdi/pdfwriter_impl.cxx
@@ -3529,7 +3529,7 @@ std::map< sal_Int32, sal_Int32 > PDFWriterImpl::emitEmbeddedFont( const Physical
                 "<</Type/Font/Subtype/Type1/BaseFont/" );
             appendName( aInfo.m_aPSName, aLine );
             aLine.append( "\n" );
-            if( !pFont->IsSymbolFont() &&  pEncoding == nullptr )
+            if( !pFont->IsSymbolFont() && ( pEncoding == nullptr || pFont->GetCharSet() == RTL_TEXTENCODING_MS_1252 ))
                 aLine.append( "/Encoding/WinAnsiEncoding\n" );
             if( nToUnicodeStream )
             {

The mentioned revert commit:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=297b22bd49ea11a90063ab8503fb83090f351668

I am new here and I just stumbled upon this bug report because I need to create a PDF and it came out garbled the whole day :( Is this fix working and could you get this into a next build of libreoffice?

Comment 15 Gilbert Röhrbein 2016-10-13 17:28:49 UTC

It would be nice if one of you could post a how-to or a script to add /Encoding /WinAnsiEncoding into a PDF. It would be a work-around and definitely less pain than having no solution at all available.

Comment 16 Julien Nabet 2016-10-13 19:29:12 UTC

Gilbert: FYI, I proposed the patch here:
https://gerrit.libreoffice.org/#/c/29792/1

Comment 17 Commit Notification 2016-11-14 11:36:47 UTC

Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=52040395e3046ac42b8c3dd385c7b1cb26b929f3

tdf#34212: Accented Characters and Umlauts are missing with Type1 fonts

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 18 przekop 2016-12-01 13:05:14 UTC

Created attachment 129183 [details]
exported pdf

I tested PDF export with Polish characters in text forms. I just installed newest LibreOfficeDev 5.3 on Ubuntu 14.04.
Problem sill occurs.

Comment 19 Julien Nabet 2016-12-01 13:09:50 UTC

(In reply to przekop from comment #18)
> Created attachment 129183 [details]
> exported pdf
> 
> I tested PDF export with Polish characters in text forms. I just installed
> newest LibreOfficeDev 5.3 on Ubuntu 14.04.
> Problem sill occurs.

Could you attach original document so we can try to reproduce this?

Comment 20 przekop 2016-12-01 13:30:14 UTC

Created attachment 129184 [details]
odt sample

I try to fill the forms after export. Polish characters are missing in fillable text forms.
Now I'm not sure is it right Bug thread, but closest to subject I could find.

Comment 21 Julien Nabet 2016-12-01 19:02:44 UTC

Created attachment 129193 [details]
export with master sources

Here the result on pc Debian x86-64 with master sources updated today.
It seems ok.

Are you sure you retrieved a version including the patch http://cgit.freedesktop.org/libreoffice/core/commit/?id=52040395e3046ac42b8c3dd385c7b1cb26b929f3 from 14/11/2016?
To be sure, could you provide BuildId (Help Menu/About)?

Comment 22 przekop 2016-12-02 07:59:08 UTC

Try to copy any text with Polish characters (1ą 2ż 3ź 4ć 5ó 6ł) and paste in field form. Most of them disappear after leaving a field.

5.3.0.0.beta1
Build ID: 690f553ecb3efd19143acbf01f3af4e289e94536

Comment 23 Commit Notification 2016-12-07 09:01:25 UTC

Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b35798df2c1f6a05d8a3a28843c64c6da548f741&h=libreoffice-5-2

tdf#34212: Accented Characters and Umlauts are missing with Type1 fonts

It will be available in 5.2.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 24 Julien Nabet 2016-12-07 17:47:42 UTC

Created attachment 129378 [details]
new export with filled fields

On pc Debian x86-64 with master sources updated today, I could reproduce the problem with fields.

Comment 25 Julien Nabet 2016-12-07 17:49:32 UTC

cleanup whiteboard since the bug is still there.

Comment 26 Adolfo Jayme Barrientos 2017-04-07 06:00:11 UTC

IMO this is more of a WONTFIX, given that the Type1 format is obsolete and is no longer accepted in 5.3.x.

Comment 27 Khaled Hosny 2017-09-25 23:09:48 UTC

We dropped support for Type 1 fonts already.