Bug Hunting Session
Bug 118913 - FILESAVE: convert-to html creates html file with default charset
Summary: FILESAVE: convert-to html creates html file with default charset
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Commandline
  Show dependency treegraph
 
Reported: 2018-07-24 09:49 UTC by Vladimir
Modified: 2019-09-09 05:30 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir 2018-07-24 09:49:13 UTC
Description:
conver-to html command uses default value of content charset instead of pointed in command line. Sometimes this can cause incorrect display of text in the document.

Steps to Reproduce:
1.Open LibreOffice and go to Tools->Options->Load/Save->HTML Compatibility
2.Choose any value in Character set combobox (for example euc-jp). This value will be default. Then click OK.
3. Run in command line soffice --convert-to "html:HTML (StarWriter):UTF8" your-file-to-convert
4. Open converted html file in text editor.

Actual Results:
Value of charset of content in <meta tag is euc-jp.

Expected Results:
Value of charset of content in <meta tag is utf-8.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Xisco Faulí 2018-07-25 08:10:24 UTC
You can't confirm your own bugs. Moving it back to UNCONFIRMED until someone
else confirms it.
Comment 2 Buovjaga 2018-09-01 13:57:40 UTC
Repro.
I tried finding the correct command line parameters for version 3.6, but failed.

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 1c59d021b3dd27c8c0255312bd0d53ad25965bab
CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: gtk3_kde5; 
Locale: fi-FI (fi_FI.UTF-8); Calc: threaded
Built on September 1st 2018
Comment 3 himajin100000 2018-09-03 16:56:02 UTC
https://opengrok.libreoffice.org/xref/core/desktop/source/app/cmdlineargs.cxx?r=5ccf8264#553
https://opengrok.libreoffice.org/xref/core/desktop/source/app/cmdlineargs.hxx?r=17ee20b1#116

https://opengrok.libreoffice.org/xref/core/desktop/source/app/app.cxx?r=b57ed763#2125
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1004
himajin100000 don't know which of the above is actually used

https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1335
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1304
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1240
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1259
https://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.hxx?r=87a9979c#59
https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx?r=9401c7c2#1304
note that the member name aPrinterName is not so intuitive for conversionparams. this member is used in the following code.
https://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.cxx?r=6b51fee8#583

To me, the syntax of PrinterName looks like
PrinterName :=  fileExtension ":" filterName ":" filterOptions ";" path "|" imagefilter
these terminal symbols are stored to conversionProperties, and then storeToURL is executed as follows.

https://opengrok.libreoffice.org/xref/core/desktop/source/app/dispatchwatcher.cxx?r=6b51fee8#689

https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#1628
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#2837
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#2851
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx?r=b5867945#2969
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objserv.cxx?r=d4f7fc2a#308
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2707
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2825
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2807
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#2850
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#1112
https://opengrok.libreoffice.org/xref/core/sfx2/source/doc/objstor.cxx?r=6a6774cc#1538

Calc=>HTML case
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#2586
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/impex.cxx?r=1f9f3517#482
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/impex.cxx#Doc2HTML    *

Writer=>HTML case
https://opengrok.libreoffice.org/xref/core/sw/source/uibase/app/docsh.cxx?r=e2a5932d#774
https://opengrok.libreoffice.org/xref/core/sw/source/uibase/app/docsh.cxx?r=e2a5932d#816
https://opengrok.libreoffice.org/xref/core/sw/source/filter/basflt/shellio.cxx?r=fe1b87eb#729
https://opengrok.libreoffice.org/xref/core/sw/source/filter/basflt/shellio.cxx?r=fe1b87eb#854
https://opengrok.libreoffice.org/xref/core/sw/source/filter/writer/writer.cxx?r=4760bc99#245
https://opengrok.libreoffice.org/xref/core/sw/source/filter/html/wrthtml.cxx?r=81fac013#224
https://opengrok.libreoffice.org/xref/core/sw/source/filter/html/wrthtml.cxx?r=81fac013#291    *
https://opengrok.libreoffice.org/search?project=core&q=&defs=&refs=m_bWriteClipboardDoc&path=&hist=&type=

Calc=>CSV case(for comparison)
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#2459
https://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/imoptdlg.cxx?r=104b26b2#56
https://opengrok.libreoffice.org/xref/core/sc/source/ui/inc/imoptdlg.hxx?r=2f564d09#64
note: member eCharset is public

https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#2460
https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/docsh.cxx?r=80343351#1979

so in csv's case, properly specifying filteroptions will change the encoding of the output HTML.
Comment 4 himajin100000 2018-09-03 16:57:56 UTC
typo:

so in csv's case properly specifying filteroptions will change the encoding of the output HTML.

=>so in csv's case  properly specifying filteroptions will change the encoding of the output file.
Comment 5 himajin100000 2018-09-06 11:48:42 UTC
To me, format of FilterOptions are dependent on filters' implementation.
and I don't think HTML (StarWriter) uses "UTF8" as filter option. I didn't find any documentation that the filter accepts such format

That said, a question came to my mind: "What made the reporter think he can give such option?"

I guess I find the answer.

https://opengrok.libreoffice.org/xref/core/desktop/source/app/cmdlinehelp.cxx?r=16970001#147

in the commandline help there exists a sample command using "XHTML Writer File" as a filter, with UTF8 specified as a filter option.

Ironically, the codes look, at least to me, not accepting any filter options, not even parsing UTF8 nor UTF-8, but always output in UTF-8 encoding. no other encoding is accepted. completely meaningless option.

https://opengrok.libreoffice.org/xref/core/filter/source/config/fragments/filters/XHTML_Writer_File.xcu?r=791a8b96#23

https://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/opendoc2xhtml.xsl?r=6f6f57d4#68
Comment 7 QA Administrators 2019-09-09 05:30:13 UTC
Dear Vladimir,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug