Created attachment 47967 [details] five documents illustrating chinese character conversion issues $ /opt/libreoffice3.4/program/soffice --version LibreOffice 3.4 340m1(Build:12) Conversion from the command line has missing Chinese characters in the resulting PDF (see below). Conversion via the GUI with default options makes a good PDF. Copying and pasting the characters from the original into a new document and converting from the CLI also makes a good pdf. Here is a list of the attached files, descriptions and command line used to generate them: -------- chinese_problem_public.doc Original file exhibiting missing characters in headless converted PDF chinese_problem_public.copy-pasted.doc New document created by copying and pasting text from original to new LO document, saving as .doc chinese_problem_public.gui.pdf Good pdf, exported from GUI from original, using default options. chinese_problem_public.copy-pasted.headless.pdf Good pdf, created from copy/pasted doc with: $> soffice --headless --convert-to pdf \ chinese_problem_public.copy-pasted.doc chinese_problem_public.headless.pdf Bad pdf, with missing characters, created as above from original ------ Thanks for any guidance if a workaround is possible. Sincerely, Brandon Simmons http://coder.bsimmons.name p.s. thanks to all the developers working on LibreOffice. It was exciting to see the improvements to the soffice binary :)
I've just come across this bug: https://bugs.freedesktop.org/show_bug.cgi?id=36313 This may be related or a duplicate.
I've added another attachment containing two files that are a simplified version of the original problem: chars.fails.doc - created by taking the original problem file and replacing text with an example of problem characters chars.converts.doc - created new file in LO and pasted in the same text I assume this is some encoding issue in the template in the original problem file, but have no idea.
Created attachment 48300 [details] two simpler files illustrating the issue
Created attachment 48349 [details] Google docs export that converts incorrectly The same characters as in the previous attachment set, but from a document created with Google Docs and exported as a Word file (they use Aspose.Words under the hood). This converts to PDF with missing characters as well.
Another clue: when doing a conversion from the chars.fails.doc (attached previously) to a text file using the UNO API (via pyODConverter) the characters convert correctly.
(In reply to comment #5) > Another clue: when doing a conversion from the chars.fails.doc (attached > previously) to a text file using the UNO API (via pyODConverter) the characters > convert correctly. I wanted to add that the resulting converted file is identified by 'file' as: /tmp/chars.txt: UTF-8 Unicode (with BOM) text I'm not sure if it's relevant.
This appears to be a font issue. You can compare the fonts in the different PDF documents, and you can see that the problematic PDF does not have the Chinese font that the good PDFs use.
(In reply to comment #7) > This appears to be a font issue. You can compare the fonts in the different PDF > documents, and you can see that the problematic PDF does not have the Chinese > font that the good PDFs use. Thanks for looking into this, Simos. I didn't notice that the fonts in my small test document were different. It looks like the original doc is using "SimSun" which I guess I don't have installed (it's an MS font). Is the font embedded in the .doc or something? Is there a workaround you could suggest? I'm quite lost.
After installing the SimSun font, the document converted correctly in headless mode. I suppose if there's still a bug here it is that the behavior in headless mode is inconsistent with the behavior in GUI mode. Thanks for the help.
[This is an automated message.] This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it started right out as NEW without ever being explicitly confirmed. The bug is changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases. Details on how to test the 3.5.0 beta1 can be found at: http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1 more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
Dear bug submitter! Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs. To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem. Yours! Florian