Bug 118913 - FILESAVE: convert-to html creates html file with default charset
Summary: FILESAVE: convert-to html creates html file with default charset
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Commandline
  Show dependency treegraph
 
Reported: 2018-07-24 09:49 UTC by Vladimir
Modified: 2018-09-08 18:22 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir 2018-07-24 09:49:13 UTC
Description:
conver-to html command uses default value of content charset instead of pointed in command line. Sometimes this can cause incorrect display of text in the document.

Steps to Reproduce:
1.Open LibreOffice and go to Tools->Options->Load/Save->HTML Compatibility
2.Choose any value in Character set combobox (for example euc-jp). This value will be default. Then click OK.
3. Run in command line soffice --convert-to "html:HTML (StarWriter):UTF8" your-file-to-convert
4. Open converted html file in text editor.

Actual Results:
Value of charset of content in <meta tag is euc-jp.

Expected Results:
Value of charset of content in <meta tag is utf-8.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Xisco Faulí 2018-07-25 08:10:24 UTC
You can't confirm your own bugs. Moving it back to UNCONFIRMED until someone
else confirms it.
Comment 2 Buovjaga 2018-09-01 13:57:40 UTC
Repro.
I tried finding the correct command line parameters for version 3.6, but failed.

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 1c59d021b3dd27c8c0255312bd0d53ad25965bab
CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: gtk3_kde5; 
Locale: fi-FI (fi_FI.UTF-8); Calc: threaded
Built on September 1st 2018
Comment 3 himajin100000 2018-09-03 16:56:02 UTC
https://opengrok.libreoffice.org/xref/core/desktop/source/app/cmdlineargs.cxx?r=5ccf8264#553
https://opengrok.libreoffice.org/xref/core/desktop/source/app/cmdlineargs.hxx?r=17ee20b1#116

https://opengrok.libreoffice.org/xref/core/desktop/source/app/app.cxx?r=b57ed763#2125
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1004
himajin100000 don't know which of the above is actually used

https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1335
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1304
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1240
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1259
https://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.hxx?r=87a9979c#59
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1304
note that the member name aPrinterName is not so intuitive for conversionparams. this member is used in the following code.
https://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.cxx?r=6b51fee8#583

To me, the syntax of PrinterName looks like
PrinterName :=  fileExtension ":" filterName ":" filterOptions ";" path "|" imagefilter
these terminal symbols are stored to conversionProperties, and then storeToURL is executed as follows.

https://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.cxx?r=6b51fee8#689

https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#1628
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#2837
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#2851
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#2969
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objserv.cxx?r=d4f7fc2a#308
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2707
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2825
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2807
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2850
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#1112
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#1538

Calc=>HTML case
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#2586
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/impex.cxx?r=1f9f3517#482
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/impex.cxx#Doc2HTML    *

Writer=>HTML case
https://opengrok.libreoffice.org/xref/core/sw/source/uibase/app/docsh.cxx?r=e2a5932d#774
https://opengrok.libreoffice.org/xref/core/sw/source/uibase/app/docsh.cxx?r=e2a5932d#816
https://opengrok.libreoffice.org/xref/core/sw/source/filter/basflt/shellio.cxx?r=fe1b87eb#729
https://opengrok.libreoffice.org/xref/core/sw/source/filter/basflt/shellio.cxx?r=fe1b87eb#854
https://opengrok.libreoffice.org/xref/core/sw/source/filter/writer/writer.cxx?r=4760bc99#245
https://opengrok.libreoffice.org/xref/core/sw/source/filter/html/wrthtml.cxx?r=81fac013#224
https://opengrok.libreoffice.org/xref/core/sw/source/filter/html/wrthtml.cxx?r=81fac013#291    *
https://opengrok.libreoffice.org/search?project=core&q=&defs=&refs=m_bWriteClipboardDoc&path=&hist=&type=

Calc=>CSV case(for comparison)
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#2459
https://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/imoptdlg.cxx?r=104b26b2#56
https://opengrok.libreoffice.org/xref/core/sc/source/ui/inc/imoptdlg.hxx?r=2f564d09#64
note: member eCharset is public

https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#2460
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#1979

so in csv's case, properly specifying filteroptions will change the encoding of the output HTML.
Comment 4 himajin100000 2018-09-03 16:57:56 UTC
typo:

so in csv's case properly specifying filteroptions will change the encoding of the output HTML.

=>so in csv's case  properly specifying filteroptions will change the encoding of the output file.
Comment 5 himajin100000 2018-09-06 11:48:42 UTC
To me, format of FilterOptions are dependent on filters' implementation.
and I don't think HTML (StarWriter) uses "UTF8" as filter option. I didn't find any documentation that the filter accepts such format

That said, a question came to my mind: "What made the reporter think he can give such option?"

I guess I find the answer.

https://opengrok.libreoffice.org/xref/core/desktop/source/app/cmdlinehelp.cxx?r=16970001#147

in the commandline help there exists a sample command using "XHTML Writer File" as a filter, with UTF8 specified as a filter option.

Ironically, the codes look, at least to me, not accepting any filter options, not even parsing UTF8 nor UTF-8, but always output in UTF-8 encoding. no other encoding is accepted. completely meaningless option.

https://opengrok.libreoffice.org/xref/core/filter/source/config/fragments/filters/XHTML_Writer_File.xcu?r=791a8b96#23

https://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/opendoc2xhtml.xsl?r=6f6f57d4#68