Bug 132558 - Headless conversion of HTML to XLS or XLSX not working
Summary: Headless conversion of HTML to XLS or XLSX not working
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
(earliest affected) release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
Depends on:
Reported: 2020-04-30 15:06 UTC by kaikun
Modified: 2020-06-12 14:38 UTC (History)
0 users

See Also:
Crash report or crash signature:
Regression By:

HTML file which crashes when trying to convert it (1.52 MB, text/html)
2020-04-30 15:08 UTC, kaikun

Note You need to log in before you can comment on or make changes to this bug.
Description kaikun 2020-04-30 15:06:12 UTC
I try to batch convert HTML files to XLS or XLSX Excel files.
Using the GUI it works fine to import the HTML and then save it as any type.
However, via the terminal I ran into issues.

I tried following multiple suggestions (https://stackoverflow.com/questions/30349542/command-libreoffice-headless-convert-to-pdf-test-docx-outdir-pdf-is-not, https://stackoverflow.com/questions/22062973/libreoffice-convert-to-not-working, https://stackoverflow.com/questions/52277264/convert-doc-to-docx-using-soffice-not-working), but it does not work.

The commands I use:

`libreoffice6.4 --headless --convert-to xlsx:"Calc MS Excel 2007 XML" filename.html`, which returns an "Application Error".

Similarly for libreoffice 6.0: 
`soffice --headless --convert-to xlsx:"Calc MS Excel 2007 XML" filename.html`

If I try to convert to xls with any of the specified export filters (https://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config/fragments/filters):

`libreoffice6.4 --headless --convert-to xls:"MS Excel 95" filename.html`
`libreoffice6.4 --headless --convert-to xls:"MS Excel 97" filename.html`
`libreoffice6.4 --headless --convert-to xls:"MS Excel 2003 XML" filename.html`
`libreoffice6.4 --headless --convert-to xls:"MS Excel 2003 XML 0rcus" filename.html`

It will throw "Error: Please verify input parameters... (SfxBaseModel::impl_store [..]"

Other conversion such as CSV to HTML or CSV to XLS work just fine:
`libreoffice6.4 --headless --convert-to html hardware_tutorial_gold.csv`
`libreoffice6.4 --headless --convert-to xlsx hardware_tutorial_gold.csv`

I have additionally tried to use the in-filter `--infilter="HTML Document"`. Reinstall libreoffice, update it, and several other hints. Nothing has solved it. It seems to be related to the HTML input format. However, as it works over the GUI, it should be possible via the CL.

Steps to Reproduce:
1. I have uploaded one HTML file for reproduction here: https://ufile.io/zh40nkdb, but for me it happened with any. Download an HTML file.
2. Open a terminal in the folder where the file is stored
3. Run "libreoffice6.4 --headless --convert-to xlsx:"Calc MS Excel 2007 XML" hardware_tutorial_gold.html"

Actual Results:
The console prints "Application Error".

Expected Results:
A new file in the current directory is created with the converted xlsx format of the HTML file. Similarly to using the GUI.

Reproducible: Always

User Profile Reset: No

Additional Info:
Convert the file and not crash.
Comment 1 kaikun 2020-04-30 15:08:07 UTC
Created attachment 160135 [details]
HTML file which crashes when trying to convert it

HTML file which does not work to convert to xlsx or xls
Comment 2 kaikun 2020-05-02 12:45:04 UTC
Okay, this can be closed. I finally found the solution. It needs the --calc flag additionally. However, the error messages really need to be improved.

Thanks to: https://stackoverflow.com/questions/34362464/libreoffice-converting-html-to-xls-or-xlsx
Comment 3 Timur 2020-05-02 16:17:21 UTC
NotABug indicates that documentation should be in improved. 
There's page
https://help.libreoffice.org/6.4/en-US/text/shared/guide/start_parameters.html but for headless it's not clear and Stackoverflow is better.