Description: This command does not respect/use the entered encoding: soffice --infilter="DosWord:CP850" --convert-to pdf "File" --outdir "Folder" Why? When I do the same manually with the GUI, then it works (choosing 'Microsoft Word for DOS' and 'Western Europe (DOS/OS2-850)') Steps to Reproduce: 1. Try to convert a 'Microsoft Word for DOS'-File containing special charcters like 'ä, ö, ü' with Encoding CP850 to pdf 2. Check pdf for correctness 3. Actual Results: ä looks like this „ ü looks like this š ö looks like this ” ... Expected Results: ä ü ö Reproducible: Always User Profile Reset: No Additional Info:
Would it be possible you attach the Word file so we can try to reproduce the pb? Before doing it, take a look to https://wiki.documentfoundation.org/QA/Bugzilla/Sanitizing_Files_Before_Submission if there's confidential/private part.
Created attachment 155640 [details] This is an example WordDOS-File example WordDOS-File
Steph-nb: since I'm on cc of the bugtracker, I receive a notification each time this bugtracker is modified (attachment/comment added, status change, etc.)
Created attachment 155653 [details] converted file On pc Debian x86-64 with master sources updated today, I don't reproduce this. First since it's a Word file, I rename the file "2schoko.txt" into "2schoko.doc". Then I used your command line. I noticed that it asks me to confirm encoding, I clicked "Ok", then it generated the file
Argh, I got an error with LO Debian package 6.3.3 Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///tmp/2schoko.pdf> failed: 0x11b(Error Area:Io Class:Abort Code:27))
works for me in Version: 6.4.0.0.alpha1+ Build ID: 498c2d3944b666c2f016b65903001920db2cb2a4 CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; I can see letters ü, ä in the result document. ./instdir/program/soffice --infilter="DosWord:CP850" --convert-to pdf "/tmp/2schoko.doc" --outdir "/tmp" I noticed that it asks me to confirm encoding, I clicked "Ok", then it generated the file -> confirm this behaviour.
The current code is in core/writerperfect/source/writer/MSWorksImportFilter.cxx, function MSWorksImportFilter::doImportDocument which tries to open a dialog which asks the user for the file's encoding. If this fails, libwps is called with no encoding => libwps/src/lib/DosWord.cpp tried to find a reasonable encoding based on the file's codepage (but this is only a heuristic). Notes: - I suppose that we can use the argument utl::MediaDescriptor to retrieve the encoding (and if a encoding is found, use it and do not open a dialog) but I am not sure how to do that. Maybe we can take some inspiration from core/writerperfect/source/writer/EBookImportFilter.cxx... - if someone knows how to modify this function, a similar problem will exist in writerperfect/source/calc/MSWorksCalcImportFilter.cxx, so MSWorksCalcImportFilter::doImportDocument will have to be modified similarly.
Some more observations: There are three options to invoke the command (at least in the windows version): soffice.exe soffice.com (prints the best output) swriter.exe The popup (which is not wanted at all, as the aim is a batch conversion), only appears, if in the background there is a LibreOffice instance already running.
Created attachment 155755 [details] proposition of patch to retrieve filter option in the command line On OSX, the behaviour also depends if the LibreOffice's application is launched or not. In the first case, a dialog is opened to let the user choose the encoding; if not, it lets libwps choose the encoding. So a proposition of patch to check if we find some FILTEROPTIONS properties: - if yes, we use that property as encoding, - if not, we revert to the previous behaviour (with the difference that if the dialog can not be created, we ask libwps to use the encoding proposed in the dialog). Note: - I also modified writerperfect/source/calc/MSWorksCalcImportFilter.cxx which has basically the same behaviour.
osnola: great to see a patch is proposed! I noticed you've already contributed to LO, would it be possible you submit the patch on gerrit? (see https://wiki.documentfoundation.org/Development/gerrit)
Ok, I just do that: https://gerrit.libreoffice.org/#/c/82595/
alonso committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/f14cd1ad62e6f17f2a1e56a7d4dfb8fad8d5375e writerperfect[libwps,tdf#128673]: use the inFilter option in headless mode... It will be available in 6.5.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
A polite ping to alonso: Is this bug fixed? if so, could you please close it as RESOLVED FIXED ? Otherwise, Could you please explain what's missing? Thanks
Yes, normally this problem must be fixed, thank. So I just mark it as resolved, ...
alonso committed a patch related to this issue. It has been pushed to "libreoffice-6-4": https://git.libreoffice.org/core/commit/5fc07374dc00f1c35839cb3f2b9fb712a88272e6 writerperfect[libwps,tdf#128673]: use the inFilter option in headless mode... It will be available in 6.4.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.