Bug 34212 - Accented Characters and Umlauts are missing with Type1 fonts
Summary: Accented Characters and Umlauts are missing with Type1 fonts
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 87932 94061 (view as bug list)
Depends on:
Blocks: Fonts PDF-Export
  Show dependency treegraph
 
Reported: 2011-02-12 09:19 UTC by astumpf
Modified: 2017-11-04 11:57 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:


Attachments
exported pdf (41.67 KB, application/pdf)
2016-12-01 13:05 UTC, przekop
Details
odt sample (11.05 KB, application/vnd.oasis.opendocument.text)
2016-12-01 13:30 UTC, przekop
Details
export with master sources (24.88 KB, application/.pdf)
2016-12-01 19:02 UTC, Julien Nabet
Details
new export with filled fields (25.57 KB, application/.pdf)
2016-12-07 17:47 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description astumpf 2011-02-12 09:19:42 UTC
PDF Export shows accented characters and umlauts as blank, when Type1-Fonts are used. The problem seems to be, that the PDF doesn't include the definition of an encoding vector. Adding "/Encoding /WinAnsiEncoding" to the Font-Object could be a quick fix at least for west european characters. 

The error is known in Open Office and documented here: http://www.openoffice.org/issues/show_bug.cgi?id=63015
Comment 1 Björn Michaelsen 2011-12-23 11:50:41 UTC Comment hidden (obsolete)
Comment 2 astumpf 2011-12-23 13:50:59 UTC
I tested again with Release 3.5.0. beta and can confirm that the bug has not been resolved. A document with Umlauts in an Adobe Type1 font looses the umlauts when exported to PDF. Cross-Checking by Printing with Adobe's PDFwriter device is working fine. The significant difference in the generated PDF files is that the sequence "/Encoding /WinAnsiEncoding" in the font object is missing in the exported PDF file.
Comment 3 Harald Kliems 2013-10-31 19:49:51 UTC
It looks like the bug still exists in 4.1.2.3. Manually adding "/Encoding /WinAnsiEncoding" in the pdf file indeed is a workaround, but certainly no beginner-friendly one.
Comment 4 edv 2014-09-18 15:37:39 UTC
I found the solution to this. 
In vcl\source\gdi\pdfwriter_impl.cxx the Line 3494:
if( !pFont->IsSymbolFont() && pEncoding == 0)
must be changed to:
if( !pFont->IsSymbolFont() )

Reason: Without the pEncoding check - "/Encoding/WinAnsiEncoding\n" is added to the pdf file font object which is correct. pEncoding specifies that a ToUnicode stream has to be generated (and it is) and nothing speaks against it because it is only a translation table and doesn't affect the encoding itself. For symbolic fonts WinAnsiEncoding would be wrong because they have there own encoding shipped with.

I don't want to upload this myself because I don't intend to do more on libreoffice and it is to tiny to go through the git/gerrit upload process and making a patch for this. So please someone else do this, I don't want any rights on that code submission.
Comment 5 Frank Berke 2015-01-05 16:41:33 UTC
Looks like this has been remedied with LO 4.4.0.1 (which still is RC), while in 4.3.5 the bug is still present.
Comment 6 astumpf 2015-01-29 19:12:52 UTC
the bug is still present in 4.4.0.3
Comment 7 luiscastro193 2015-05-04 11:41:46 UTC
I confirm the bug is still present in 4.4.2 and at least the patch posted by edv isn´t applied neither in 4.4.3 source.
Comment 8 Julien Nabet 2015-09-10 20:55:06 UTC
(In reply to edv from comment #4)
> I found the solution to this. 
> In vcl\source\gdi\pdfwriter_impl.cxx the Line 3494:
> if( !pFont->IsSymbolFont() && pEncoding == 0)
> must be changed to:
> if( !pFont->IsSymbolFont() )
> 
> Reason: Without the pEncoding check - "/Encoding/WinAnsiEncoding\n" is added
> to the pdf file font object which is correct. pEncoding specifies that a
> ToUnicode stream has to be generated (and it is) and nothing speaks against
> it because it is only a translation table and doesn't affect the encoding
> itself. For symbolic fonts WinAnsiEncoding would be wrong because they have
> there own encoding shipped with.
> 
> I don't want to upload this myself because I don't intend to do more on
> libreoffice and it is to tiny to go through the git/gerrit upload process
> and making a patch for this. So please someone else do this, I don't want
> any rights on that code submission.

Just for information, the patch had been pushed with this:
http://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=eea16cb3e65a4308caddb7618d31a76ca259dbb1

but reverted with this:
http://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=297b22bd49ea11a90063ab8503fb83090f351668
(see reasons in commit if interested)
Comment 9 Julien Nabet 2015-09-10 20:55:36 UTC
*** Bug 87932 has been marked as a duplicate of this bug. ***
Comment 10 Julien Nabet 2015-09-10 20:56:53 UTC
*** Bug 94061 has been marked as a duplicate of this bug. ***
Comment 11 drunken monkey 2015-10-25 17:16:23 UTC
Also experiencing this in LO 5.0.2.2 under Arch Linux. The bug suddenly appeared a few months ago, before that everything was working fine. However, it's also dependent on the PDF Viewer used. The one under Windows and Firefox's built-in one display the umlauts (though they look a bit off), but Evince just shows blank spaces.
Comment 12 Tom Yan 2015-10-25 21:22:52 UTC
(In reply to drunken monkey from comment #11)
> Also experiencing this in LO 5.0.2.2 under Arch Linux. The bug suddenly
> appeared a few months ago, before that everything was working fine. However,
> it's also dependent on the PDF Viewer used. The one under Windows and
> Firefox's built-in one display the umlauts (though they look a bit off), but
> Evince just shows blank spaces.

Somehow the new "gsfonts" triggered this (again): https://bugs.documentfoundation.org/show_bug.cgi?id=95221
Comment 13 Tom Yan 2015-10-25 21:37:54 UTC
However in the case I experience it has nothing to do with "WinAnsiEncoding" at all. So I will not mark my bug report as duplicate. See the attached PDFs (with an text editor or so) in my bug report for details.
Comment 14 Gilbert Röhrbein 2016-10-13 17:22:38 UTC
This might be a possible fix, based on the comment in the revert commit.

diff --git a/vcl/source/gdi/pdfwriter_impl.cxx b/vcl/source/gdi/pdfwriter_impl.cxx
index 0d886e0..8755448 100644
--- a/vcl/source/gdi/pdfwriter_impl.cxx
+++ b/vcl/source/gdi/pdfwriter_impl.cxx
@@ -3529,7 +3529,7 @@ std::map< sal_Int32, sal_Int32 > PDFWriterImpl::emitEmbeddedFont( const Physical
                 "<</Type/Font/Subtype/Type1/BaseFont/" );
             appendName( aInfo.m_aPSName, aLine );
             aLine.append( "\n" );
-            if( !pFont->IsSymbolFont() &&  pEncoding == nullptr )
+            if( !pFont->IsSymbolFont() && ( pEncoding == nullptr || pFont->GetCharSet() == RTL_TEXTENCODING_MS_1252 ))
                 aLine.append( "/Encoding/WinAnsiEncoding\n" );
             if( nToUnicodeStream )
             {

The mentioned revert commit:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=297b22bd49ea11a90063ab8503fb83090f351668

I am new here and I just stumbled upon this bug report because I need to create a PDF and it came out garbled the whole day :( Is this fix working and could you get this into a next build of libreoffice?
Comment 15 Gilbert Röhrbein 2016-10-13 17:28:49 UTC
It would be nice if one of you could post a how-to or a script to add /Encoding /WinAnsiEncoding into a PDF. It would be a work-around and definitely less pain than having no solution at all available.
Comment 16 Julien Nabet 2016-10-13 19:29:12 UTC
Gilbert: FYI, I proposed the patch here:
https://gerrit.libreoffice.org/#/c/29792/1
Comment 17 Commit Notification 2016-11-14 11:36:47 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=52040395e3046ac42b8c3dd385c7b1cb26b929f3

tdf#34212: Accented Characters and Umlauts are missing with Type1 fonts

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 przekop 2016-12-01 13:05:14 UTC
Created attachment 129183 [details]
exported pdf

I tested PDF export with Polish characters in text forms. I just installed newest LibreOfficeDev 5.3 on Ubuntu 14.04.
Problem sill occurs.
Comment 19 Julien Nabet 2016-12-01 13:09:50 UTC
(In reply to przekop from comment #18)
> Created attachment 129183 [details]
> exported pdf
> 
> I tested PDF export with Polish characters in text forms. I just installed
> newest LibreOfficeDev 5.3 on Ubuntu 14.04.
> Problem sill occurs.

Could you attach original document so we can try to reproduce this?
Comment 20 przekop 2016-12-01 13:30:14 UTC
Created attachment 129184 [details]
odt sample

I try to fill the forms after export. Polish characters are missing in fillable text forms.
Now I'm not sure is it right Bug thread, but closest to subject I could find.
Comment 21 Julien Nabet 2016-12-01 19:02:44 UTC
Created attachment 129193 [details]
export with master sources

Here the result on pc Debian x86-64 with master sources updated today.
It seems ok.

Are you sure you retrieved a version including the patch http://cgit.freedesktop.org/libreoffice/core/commit/?id=52040395e3046ac42b8c3dd385c7b1cb26b929f3 from 14/11/2016?
To be sure, could you provide BuildId (Help Menu/About)?
Comment 22 przekop 2016-12-02 07:59:08 UTC
Try to copy any text with Polish characters (1ą 2ż 3ź 4ć 5ó 6ł) and paste in field form. Most of them disappear after leaving a field.

5.3.0.0.beta1
Build ID: 690f553ecb3efd19143acbf01f3af4e289e94536
Comment 23 Commit Notification 2016-12-07 09:01:25 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b35798df2c1f6a05d8a3a28843c64c6da548f741&h=libreoffice-5-2

tdf#34212: Accented Characters and Umlauts are missing with Type1 fonts

It will be available in 5.2.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Julien Nabet 2016-12-07 17:47:42 UTC
Created attachment 129378 [details]
new export with filled fields

On pc Debian x86-64 with master sources updated today, I could reproduce the problem with fields.
Comment 25 Julien Nabet 2016-12-07 17:49:32 UTC
cleanup whiteboard since the bug is still there.
Comment 26 Adolfo Jayme Barrientos 2017-04-07 06:00:11 UTC
IMO this is more of a WONTFIX, given that the Type1 format is obsolete and is no longer accepted in 5.3.x.
Comment 27 ⁨خالد حسني⁩ 2017-09-25 23:09:48 UTC
We dropped support for Type 1 fonts already.