Bug 128673 - infilter Parameter does not use entered encoding (CP850)
Summary: infilter Parameter does not use entered encoding (CP850)
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
6.3.3.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: osnola
URL:
Whiteboard: target:6.5.0 target:6.4.1
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-08 14:21 UTC by steph-nb
Modified: 2020-01-20 16:00 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
This is an example WordDOS-File (207.00 KB, application/msword)
2019-11-08 15:53 UTC, steph-nb
Details
converted file (314.37 KB, application/pdf)
2019-11-09 10:16 UTC, Julien Nabet
Details
proposition of patch to retrieve filter option in the command line (9.14 KB, patch)
2019-11-12 15:43 UTC, osnola
Details

Note You need to log in before you can comment on or make changes to this bug.
Description steph-nb 2019-11-08 14:21:30 UTC
Description:
This command does not respect/use the entered encoding:
soffice --infilter="DosWord:CP850" --convert-to pdf "File" --outdir "Folder"
Why?

When I do the same manually with the GUI, then it works (choosing 'Microsoft Word for DOS' and 'Western Europe (DOS/OS2-850)')

Steps to Reproduce:
1. Try to convert a 'Microsoft Word for DOS'-File containing special charcters like 'ä, ö, ü' with Encoding CP850 to pdf
2. Check pdf for correctness
3.

Actual Results:
ä looks like this „
ü looks like this š
ö looks like this ”
...

Expected Results:
ä
ü
ö


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Julien Nabet 2019-11-08 14:50:54 UTC
Would it be possible you attach the Word file so we can try to reproduce the pb?

Before doing it, take a look to https://wiki.documentfoundation.org/QA/Bugzilla/Sanitizing_Files_Before_Submission if there's confidential/private part.
Comment 2 steph-nb 2019-11-08 15:53:09 UTC
Created attachment 155640 [details]
This is an example WordDOS-File

example WordDOS-File
Comment 3 Julien Nabet 2019-11-08 17:52:12 UTC
Steph-nb: since I'm on cc of the bugtracker, I receive a notification each time this bugtracker is modified (attachment/comment added, status change, etc.)
Comment 4 Julien Nabet 2019-11-09 10:16:40 UTC
Created attachment 155653 [details]
converted file

On pc Debian x86-64 with master sources updated today, I don't reproduce this.

First since it's a Word file, I rename the file "2schoko.txt" into "2schoko.doc".
Then I used your command line.
I noticed that it asks me to confirm encoding, I clicked "Ok", then it generated the file
Comment 5 Julien Nabet 2019-11-09 10:19:14 UTC
Argh, I got an error with LO Debian package 6.3.3
Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///tmp/2schoko.pdf> failed: 0x11b(Error Area:Io Class:Abort Code:27))
Comment 6 raal 2019-11-10 07:28:26 UTC
works for me in Version: 6.4.0.0.alpha1+
Build ID: 498c2d3944b666c2f016b65903001920db2cb2a4
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 

I can see letters ü, ä in the result document.

./instdir/program/soffice --infilter="DosWord:CP850" --convert-to pdf "/tmp/2schoko.doc" --outdir "/tmp"


I noticed that it asks me to confirm encoding, I clicked "Ok", then it generated the file  -> confirm this behaviour.
Comment 7 osnola 2019-11-10 10:52:52 UTC
The current code is in core/writerperfect/source/writer/MSWorksImportFilter.cxx, function MSWorksImportFilter::doImportDocument which tries to open a dialog which asks the user for the file's encoding. If this fails, libwps is called with no encoding => libwps/src/lib/DosWord.cpp tried to find a reasonable encoding based on the file's codepage (but this is only a heuristic).

Notes:
- I suppose that we can use the argument utl::MediaDescriptor to retrieve the encoding (and if a encoding is found, use it and do not open a dialog) but I am not sure how to do that. Maybe we can take some inspiration from core/writerperfect/source/writer/EBookImportFilter.cxx...
- if someone knows how to modify this function, a similar problem will exist in writerperfect/source/calc/MSWorksCalcImportFilter.cxx, so MSWorksCalcImportFilter::doImportDocument will have to be modified similarly.
Comment 8 steph-nb 2019-11-12 07:44:51 UTC
Some more observations:

There are three options to invoke the command (at least in the windows version):
soffice.exe
soffice.com (prints the best output)
swriter.exe


The popup (which is not wanted at all, as the aim is a batch conversion), only appears, if in the background there is a LibreOffice instance already running.
Comment 9 osnola 2019-11-12 15:43:20 UTC
Created attachment 155755 [details]
proposition of patch to retrieve filter option in the command line

On OSX, the behaviour also depends if the LibreOffice's application is launched or not. In the first case, a dialog is opened to let the user choose the encoding; if not, it lets libwps choose the encoding.

So a proposition of patch to check if we find some FILTEROPTIONS properties:
- if yes, we use that property as encoding, 
- if not, we revert to the previous behaviour (with the difference that if the dialog can not be created, we ask libwps to use the encoding proposed in the dialog).

Note:
- I also modified writerperfect/source/calc/MSWorksCalcImportFilter.cxx which has basically the same behaviour.
Comment 10 Julien Nabet 2019-11-12 20:23:23 UTC
osnola: great to see a patch is proposed!
I noticed you've already contributed to LO, would it be possible you submit the patch on gerrit?
(see https://wiki.documentfoundation.org/Development/gerrit)
Comment 11 osnola 2019-11-13 13:21:55 UTC
Ok, I just do that: https://gerrit.libreoffice.org/#/c/82595/
Comment 12 Commit Notification 2019-12-18 07:05:18 UTC
alonso committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f14cd1ad62e6f17f2a1e56a7d4dfb8fad8d5375e

writerperfect[libwps,tdf#128673]: use the inFilter option in headless mode...

It will be available in 6.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Xisco Faulí 2020-01-20 14:19:16 UTC
A polite ping to alonso:
Is this bug fixed? if so, could you please close it as RESOLVED FIXED ?
Otherwise, Could you please explain what's missing?
Thanks
Comment 14 osnola 2020-01-20 15:46:58 UTC
Yes, normally this problem must be fixed, thank. So I just mark it as resolved, ...
Comment 15 Commit Notification 2020-01-20 16:00:36 UTC
alonso committed a patch related to this issue.
It has been pushed to "libreoffice-6-4":

https://git.libreoffice.org/core/commit/5fc07374dc00f1c35839cb3f2b9fb712a88272e6

writerperfect[libwps,tdf#128673]: use the inFilter option in headless mode...

It will be available in 6.4.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.