Created attachment 47967 [details]
five documents illustrating chinese character conversion issues
$ /opt/libreoffice3.4/program/soffice --version
LibreOffice 3.4 340m1(Build:12)
Conversion from the command line has missing Chinese characters in the resulting PDF (see below).
Conversion via the GUI with default options makes a good PDF. Copying and pasting the characters from the original into a new document and converting from the CLI also makes a good pdf.
Here is a list of the attached files, descriptions and command line used to generate them:
Original file exhibiting missing characters in headless
New document created by copying and pasting text from original
to new LO document, saving as .doc
Good pdf, exported from GUI from original, using default
Good pdf, created from copy/pasted doc with:
$> soffice --headless --convert-to pdf \
Bad pdf, with missing characters, created as above from original
Thanks for any guidance if a workaround is possible.
p.s. thanks to all the developers working on LibreOffice. It was exciting to see the improvements to the soffice binary :)
I've just come across this bug:
This may be related or a duplicate.
I've added another attachment containing two files that are a simplified version of the original problem:
chars.fails.doc - created by taking the original problem file and replacing text with an example of problem characters
chars.converts.doc - created new file in LO and pasted in the same text
I assume this is some encoding issue in the template in the original problem file, but have no idea.
Created attachment 48300 [details]
two simpler files illustrating the issue
Created attachment 48349 [details]
Google docs export that converts incorrectly
The same characters as in the previous attachment set, but from a document created with Google Docs and exported as a Word file (they use Aspose.Words under the hood).
This converts to PDF with missing characters as well.
Another clue: when doing a conversion from the chars.fails.doc (attached previously) to a text file using the UNO API (via pyODConverter) the characters convert correctly.
(In reply to comment #5)
> Another clue: when doing a conversion from the chars.fails.doc (attached
> previously) to a text file using the UNO API (via pyODConverter) the characters
> convert correctly.
I wanted to add that the resulting converted file is identified by 'file' as:
/tmp/chars.txt: UTF-8 Unicode (with BOM) text
I'm not sure if it's relevant.
This appears to be a font issue. You can compare the fonts in the different PDF documents, and you can see that the problematic PDF does not have the Chinese font that the good PDFs use.
(In reply to comment #7)
> This appears to be a font issue. You can compare the fonts in the different PDF
> documents, and you can see that the problematic PDF does not have the Chinese
> font that the good PDFs use.
Thanks for looking into this, Simos. I didn't notice that the fonts in my small test document were different. It looks like the original doc is using "SimSun" which I guess I don't have installed (it's an MS font). Is the font embedded in the .doc or something?
Is there a workaround you could suggest? I'm quite lost.
After installing the SimSun font, the document converted correctly in headless mode. I suppose if there's still a bug here it is that the behavior in headless mode is inconsistent with the behavior in GUI mode.
Thanks for the help.
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
Dear bug submitter!
Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.
To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement
Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.