Bug 132558

Summary: Headless conversion of HTML to XLS or XLSX not working
Product: LibreOffice Reporter: kaikun <kaktus.w.h>
Component: LibreOfficeAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED NOTABUG    
Severity: normal    
Priority: medium    
Version: 6.0.0.3 release   
Hardware: All   
OS: Linux (All)   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=89739
https://bugs.documentfoundation.org/show_bug.cgi?id=63324
Whiteboard:
Crash report or crash signature: Regression By:
Attachments: HTML file which crashes when trying to convert it

Description kaikun 2020-04-30 15:06:12 UTC
Description:
I try to batch convert HTML files to XLS or XLSX Excel files.
Using the GUI it works fine to import the HTML and then save it as any type.
However, via the terminal I ran into issues.

I tried following multiple suggestions (https://stackoverflow.com/questions/30349542/command-libreoffice-headless-convert-to-pdf-test-docx-outdir-pdf-is-not, https://stackoverflow.com/questions/22062973/libreoffice-convert-to-not-working, https://stackoverflow.com/questions/52277264/convert-doc-to-docx-using-soffice-not-working), but it does not work.

The commands I use:

`libreoffice6.4 --headless --convert-to xlsx:"Calc MS Excel 2007 XML" filename.html`, which returns an "Application Error".

Similarly for libreoffice 6.0: 
`soffice --headless --convert-to xlsx:"Calc MS Excel 2007 XML" filename.html`

If I try to convert to xls with any of the specified export filters (https://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config/fragments/filters):

`libreoffice6.4 --headless --convert-to xls:"MS Excel 95" filename.html`
`libreoffice6.4 --headless --convert-to xls:"MS Excel 97" filename.html`
`libreoffice6.4 --headless --convert-to xls:"MS Excel 2003 XML" filename.html`
`libreoffice6.4 --headless --convert-to xls:"MS Excel 2003 XML 0rcus" filename.html`

It will throw "Error: Please verify input parameters... (SfxBaseModel::impl_store [..]"

Other conversion such as CSV to HTML or CSV to XLS work just fine:
`libreoffice6.4 --headless --convert-to html hardware_tutorial_gold.csv`
`libreoffice6.4 --headless --convert-to xlsx hardware_tutorial_gold.csv`

I have additionally tried to use the in-filter `--infilter="HTML Document"`. Reinstall libreoffice, update it, and several other hints. Nothing has solved it. It seems to be related to the HTML input format. However, as it works over the GUI, it should be possible via the CL.


Steps to Reproduce:
1. I have uploaded one HTML file for reproduction here: https://ufile.io/zh40nkdb, but for me it happened with any. Download an HTML file.
2. Open a terminal in the folder where the file is stored
3. Run "libreoffice6.4 --headless --convert-to xlsx:"Calc MS Excel 2007 XML" hardware_tutorial_gold.html"

Actual Results:
The console prints "Application Error".

Expected Results:
A new file in the current directory is created with the converted xlsx format of the HTML file. Similarly to using the GUI.


Reproducible: Always


User Profile Reset: No



Additional Info:
Convert the file and not crash.
Comment 1 kaikun 2020-04-30 15:08:07 UTC
Created attachment 160135 [details]
HTML file which crashes when trying to convert it

HTML file which does not work to convert to xlsx or xls
Comment 2 kaikun 2020-05-02 12:45:04 UTC
Okay, this can be closed. I finally found the solution. It needs the --calc flag additionally. However, the error messages really need to be improved.

Thanks to: https://stackoverflow.com/questions/34362464/libreoffice-converting-html-to-xls-or-xlsx
Comment 3 Timur 2020-05-02 16:17:21 UTC
NotABug indicates that documentation should be in improved. 
There's page
https://help.libreoffice.org/6.4/en-US/text/shared/guide/start_parameters.html but for headless it's not clear and Stackoverflow is better.