Created attachment 128004 [details]
PDF export with Writer 22.214.171.124
It’s a positive note, that hexadecimal code input with more than four digits works in LibreOffice still since version 5.1. On a less positive note, the PDF export does not work with hexadecimal code more than four digits.
For example: If I want to type in a LO-Writer document (.odt) the Unicode Character ‘Mathematical italic small e’ (U+1D452) because I need the correct ‘e’ for Euler’s number, then in my odt the correct character arises (see “Characters in Unicode.odt”). If I make then a PDF export of this document, all characters which have more than four digits in their hexadecimal code – e.g. the ‘Mathematical italic small e’ which has five – are shown with an square or something curios else (see “Characters in Unicode - PDF export Writer 126.96.36.199.pdf”).
If I save my odt as docx and open it then with Word 2016, the PDF export there works fine (see “Characters in Unicode.docx” and “Characters in Unicode - PDF export Word 2016.pdf”).
I attached the mentioned four files, which illustrate well what is the fail, because I marked the characters.
Created attachment 128006 [details]
PDF export with Word 2016
Created attachment 128007 [details]
“Characters in Unicode” in Writer
Created attachment 128008 [details]
“Charakters in Unicode” in Word
I get the same result, except that Mathematischer kursiver Kleinbuchstabe E is shown as a square already in LibreOffice!
Note for testers: you have to have Segoe UI font.
Win 7 Pro 64-bit Version: 188.8.131.52.alpha1+
Build ID: 4b4abb73fcd7f2802e73102b3e7c30face8d309c
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; Layout Engine: old;
TinderBox: Win-x86@39, Branch:master, Time: 2016-10-31_02:54:50
Locale: fi-FI (fi_FI); Calc: group
Litte correction to Buovjaga:
for the most text the font „Segoe UI“ is used. But for the characters with more than four digits in this document the font „Segoe UI Symbol“ is needed, because this font includes the Unicodeblock „Mathematical Alphanumeric Symbols“.
Printing to a GS based PDF generator retains the codepoints and glyphs in the PDF.
With the daily build below, the exported PDF looks fine with the new layout engine, but shows erroneous glyphs instead of those three characters with the old one.
Build ID: a6ce5d391476e4b6a2cb2d92ff45548c1d75684b
CPU Threads: 4; OS Version: Windows 6.1; UI Render: GL; Layout Engine: new;
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-11-04_00:03:22
Locale: hu-HU (hu_HU); Calc: CL
Confirming that the HarfBuzz common layout does some good with the export to PDF for both OpenGL and GDI+ rendering with the new layout engine. And that using the old DirectWrite layout engine with the PDF export filter does not pass the SMP glyphs through to the PDF.
So, fixed with the new layout engine.
On Windows 10 Pro 64-bit (1607) en-US with
Build ID: 32bdc5097013e7efd9c85e1b8df697880e66e925
CPU Threads: 8; OS Version: Windows 6.2; UI Render: GL; Layout Engine: new;
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-11-04_23:30:30
Locale: en-US (en_US); Calc: CL
Closing as resolved fixed by commits for bug 89870 and fact that any work would be done on the "old" DirectWrite WinLayout code.
Please reopen if that work is perceived as necessary, or must be resolved for 5.2
*** Bug 103760 has been marked as a duplicate of this bug. ***
I have installed LO 184.108.40.206.alpha1+ and tried the PDF export of my “Characters in Unicode.odt”.
Result: The bug persists. All characters which have more than four digits in their hexadecimal code are shown with an square or something curios else.
(In reply to Dirk W. from comment #10)
> I have installed LO 220.127.116.11.alpha1+ and tried the PDF export of my
> “Characters in Unicode.odt”.
> Result: The bug persists. All characters which have more than four digits in
> their hexadecimal code are shown with an square or something curios else.
Please copy and paste here the contents of the Help - About box in your 5.3.
On Windows 10 Enterprise 64-bit (1607) en-US (VirtualBox) with
Build ID: f4ca1573fcf445164c068c1046ab5d084e1b005f
CPU Threads: 2; OS Version: Windows 6.2; UI Render: default;
Locale: en-US (en_US); Calc: group
(In reply to Dirk W. from comment #12)
> On Windows 10 Enterprise 64-bit (1607) en-US (VirtualBox) with
> Version: 18.104.22.168.alpha1
> Build ID: f4ca1573fcf445164c068c1046ab5d084e1b005f
> CPU Threads: 2; OS Version: Windows 6.2; UI Render: default;
> Locale: en-US (en_US); Calc: group
That build does not have the new HarfBuzz based layout enabled by default. You would need to set the variable "SAL_USE_COMMON_LAYOUT" to activate it.
But rather than the Alpha1 build, suggest you install current daily build of master from here: http://dev-builds.libreoffice.org/daily/master/
There have been a number of patches of the new common layout since Alpha1 was built including default enabling of the new layout.
Please test with the new layout enabled either with the Alpha1 or current master
Many thanks for this additional information! I Can confirm that this bug is fixed in the current master (2016-11-07).
Hello „V Stuart Foote“,
I did not understand, what you mean with „You would need to set the variable "SAL_USE_COMMON_LAYOUT" to activate it.“, but I downloaded both current daily build of master – „master~2016-11-08_06.11.45_LibreOfficeDev_22.214.171.124.alpha1_Win_x86.msi“ and „master~2016-11-07_13.03.37_LibreOfficeDev_126.96.36.199.alpha1_Win_x64_en-US_de_ar_ja_ru_qtz.msi“ – and installed/deinstalled them.
Result: In both versions, the PDF export works.
If the similar bug 103468 – „Hexadecimal code input with more than four digits sometimes works, sometimes not“ – is also repaired, I cannot say. But what I can say, is, that all characters which have more than four digits in their hexadecimal code are shown correctly – at the moment.
OK then this is resolved fixed with the new HarfBuzz based text layout for bug 89870 set active by default.