Created attachment 171234 [details] PDF file generated by XeLaTeX When trying to open (both Writer and Draw are tested) a PDF file generated with XeLaTeX (using the CTeX macro and xeCJK underneath) with Chinese characters, the Chinese characters are missing, despite the English characters keep. Opening a PDF file generated by LuaLaTeX (using CTeX too, but LuaTex-JP underneath) is however okay.
Created attachment 171235 [details] The same LaTeX source generated by LuaLaTeX
Created attachment 171236 [details] Original LaTeX source
Repro in latest master: Version: 7.2.0.0.alpha0+ / LibreOffice Community Build ID: ab4a244d980061d8f68766c1b9662e07c268d62c CPU threads: 12; OS: Linux 4.15; UI render: default; VCL: gtk3 Locale: en-US (en_US.UTF-8); UI: en-US Calc: CL One difference I noticed between opening the two PDFs is that when opening xe.pdf, but not when opening lua.pdf, I get this message on the console: warn:legacy.osl:10089:10089:unotools/source/config/moduleoptions.cxx:472: unknown factory
The Chinese characters are in the temporary PDF file written to /tmp, but don't make it as far as the call to drawGlyphs in wrapper.cxx:384. They probably get dropped in the PDF Import extension, maybe somewhere in pdfparse.cxx.
Icenowy, are you running this on Linux? The PDF generated by LuaLaTex sets the collection for the FandolSong-Regular font to Adobe-Identity. The PDF generated by XeLaTeX sets the collection for the FandolSong-Regular font to Adobe-GB1. In order to display text using a font, Poppler needs a character code to Unicode mapping. For Adobe-Identity, it generates this on the fly. For Adobe-GB1 (and others), it tries to load it from the cidToUnicode file. On Linux, Poppler sets the default path for its data files (POPPLER_DATADIR) to /usr/share/poppler in its config.h, and this is where my distribution places those files, when I install Poppler from the package manager. However, in external/poppler/poppler-config.patch.1, the POPPLER_DATADIR is set to /usr/local/share/poppler. This is the directory it searches when I run xpdfimport from the working directory of my checkout. That directory is not created or populated when I run "sudo make install". If I look at the Linux Debian packages offered for download from libreoffice.org, it creates an /opt/libreoffice7.1/share directory, but there is no poppler subdir there. Neither the debs nor the working directory contain a cidToUnicode file. This makes me think that poppler library is an external dependency that LO expects to be installed by the system package manager. So, I don't see why the patch would set the POPPLER_DATADIR to /usr/local/share/poppler. The patch file was added in https://gerrit.libreoffice.org/c/core/+/56228. It's possible to set the poppler data directory at runtime, by providing an argument to the GlobalParams constructor, but xpdfwrapper does not do this in wrapper_gpl.cxx. Michael Stahl, is there any downside to just changing the POPPLER_DATADIR to /usr/share/poppler in poppler-config.patch.1?
> Is there any downside to just changing the POPPLER_DATADIR to > /usr/share/poppler in poppler-config.patch.1? It won't be correct on MS Windows. I'm actually not sure how this is packaged and distributed for Windows or Mac. Easiest solution would probably be to just bundle these files in with LO; put them in the LO/share/xpdfimport directory; pass a data dir into the GlobalParams constructor that is relative to the program installation directory. But IDK if redistributing those files would cause some other concern. Poppler itself is GPL and LO is not (that is the reason xpdfimport exists as a separate executable, after all), but these files are just data, we aren't building or linking with them. I would appreciate some input here.
(In reply to Michael Warner from comment #5) > Icenowy, are you running this on Linux? Yes, and a version packaged by my distro (although this package uses shipped poppler inside LO, not system poppler). > > The PDF generated by LuaLaTex sets the collection for the FandolSong-Regular > font to Adobe-Identity. The PDF generated by XeLaTeX sets the collection for > the FandolSong-Regular font to Adobe-GB1. > > In order to display text using a font, Poppler needs a character code to > Unicode mapping. For Adobe-Identity, it generates this on the fly. For > Adobe-GB1 (and others), it tries to load it from the cidToUnicode file. > > On Linux, Poppler sets the default path for its data files (POPPLER_DATADIR) > to /usr/share/poppler in its config.h, and this is where my distribution > places those files, when I install Poppler from the package manager. > > However, in external/poppler/poppler-config.patch.1, the POPPLER_DATADIR is > set to /usr/local/share/poppler. This is the directory it searches when I > run xpdfimport from the working directory of my checkout. That directory is > not created or populated when I run "sudo make install". If I look at the > Linux Debian packages offered for download from libreoffice.org, it creates > an /opt/libreoffice7.1/share directory, but there is no poppler subdir > there. Neither the debs nor the working directory contain a cidToUnicode > file. Thanks for this infomation. I tried to copy /usr/share/poppler to /usr/local/share/, and it now works. > > This makes me think that poppler library is an external dependency that LO > expects to be installed by the system package manager. So, I don't see why > the patch would set the POPPLER_DATADIR to /usr/local/share/poppler. This seems to be mysterious, yes. > > The patch file was added in https://gerrit.libreoffice.org/c/core/+/56228. > > It's possible to set the poppler data directory at runtime, by providing an > argument to the GlobalParams constructor, but xpdfwrapper does not do this > in wrapper_gpl.cxx. > > Michael Stahl, is there any downside to just changing the POPPLER_DATADIR to > /usr/share/poppler in poppler-config.patch.1?
(In reply to Michael Warner from comment #6) > > Is there any downside to just changing the POPPLER_DATADIR to > > /usr/share/poppler in poppler-config.patch.1? > > It won't be correct on MS Windows. I'm actually not sure how this is > packaged and distributed for Windows or Mac. Easiest solution would probably > be to just bundle these files in with LO; put them in the > LO/share/xpdfimport directory; pass a data dir into the GlobalParams > constructor that is relative to the program installation directory. But IDK > if redistributing those files would cause some other concern. Poppler itself > is GPL and LO is not (that is the reason xpdfimport exists as a separate > executable, after all), but these files are just data, we aren't building or > linking with them. Setting it to /usr/local/share/poppler is as not correct as /usr/share/poppler on Windows, right? So setting it to /usr/share/poppler at least fixes Linux. > > I would appreciate some input here.
i guess if any data files are missing they need to be bundled with LO. there is no guarantee that any such files in /usr are compatible with the version of poppler shipped in LO. apparently there's a separate "poppler-data" source package, maybe that contains those files.
*** Bug 128735 has been marked as a duplicate of this bug. ***
Michael Warner committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/648e4106cc002ff5b8184a8c104f93cb06e4b540 tdf#141709: Use poppler_data It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Warner committed a patch related to this issue. It has been pushed to "libreoffice-7-2": https://git.libreoffice.org/core/commit/98be6ca36a6e509303b69514d85471032d0dffce tdf#141709: Use poppler_data It will be available in 7.2.0.0.beta2. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I just installed: Version: 7.2.2.2 (x64) / LibreOffice Community Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56 CPU threads: 6; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win Locale: en-US (en_US); UI: en-US Calc: threaded The poppler_data directory isn't present in C:\Program Files\LibreOffice\share\xpdfimport for some reason. I checked the HEAD in master and the build files still point to that directory, so nobody else moved it to some other location. When I re-test opening xe.pdf, the characters don't appear. Perhaps Windows builds have the SYSTEM_POPPLER flag enabled for some reason? Whatever, I have to reopen this bug now.
poppler_data also not present in the Linux 7.2.2.2 build I installed from LibreOffice_7.2.2_Linux_x86-64_deb.tar.gz downloaded from libreoffice.org.
i think what is missing is that the poppler_data package isn't added to the installation set. try to add something like this in RepositoryExternal.mk same place as commit 648e4106cc002ff5b8184a8c104f93cb06e4b540 $(eval $(call gb_Helper_register_packages_for_install,pdfimport,\ poppler_data \ )) then try it with autogen.input containing --with-package-format=archive (or msi/rpm/your platform format) for testing.
Created attachment 176208 [details] Build Log Finds poppler_data.filelist on line 649 Starts creating directories on line 2132 Starts copying files on line 3809 Get an error on line 6198: ERROR: Could not copy /media/data/libreoffice/libreoffice/instdir/share/extensions to /media/data/libreoffice/libreoffice/workdir/installation/LibreOfficeDev/archive/install/en-US_inprogress/LibreOfficeDev_7.3.0.0.alpha1_Linux_x86-64_archive/./share/extensions Is a directory
Comment 16 is what happens when using --with-package-format=archive. If I instead use: --enable-epm --with-package-format=deb It seems to work. No errors in packaging, and I see the poppler_data directory in lodevbasis7.3-extension-pdf-import_7.3.0.0.alpha1-1_amd64.deb
Michael Warner committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/6ea7ca45782a7e1b46e18e994534ec0a7c71951b tdf#141709 Register poppler_data for install It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Warner committed a patch related to this issue. It has been pushed to "libreoffice-7-2": https://git.libreoffice.org/core/commit/b635846280c8fb4fb4d68f95af383ef1337eb430 tdf#141709 Register poppler_data for install It will be available in 7.2.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
7.2.4 was a hotfix release, updating target in status-whiteboard
I was able to open xe.pdf and see the Chinese characters in Windows, Mac, and Linux versions of 7.3 so I will mark this as resolved.