Bug 134681 - headless convert-to generate empty html from pdf
Summary: headless convert-to generate empty html from pdf
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.4.2 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-09 08:18 UTC by b_b
Modified: 2020-07-09 17:17 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description b_b 2020-07-09 08:18:05 UTC
Hi, we have a problem with LO headless PDF to HTML conversion on etherpad project, refs :

https://github.com/ether/etherpad-lite/issues/4151
https://github.com/ether/etherpad-lite/issues/4152

With the test file attached to the first issue, this commend generate en empty output :

soffice --headless --convert-to html:"XHTML Writer File:UTF8" install.pdf

This command wirks well, but lacks utf8 support :

soffice --headless --convert-to html install.pdf

Also tested with LO 7 as mentioned in https://github.com/ether/etherpad-lite/issues/4152#issuecomment-655531583

Thx for reading :)
Comment 1 Maxim Monastirsky 2020-07-09 09:25:56 UTC
(In reply to b_b from comment #0)
> With the test file attached to the first issue, this commend generate en
> empty output :
> 
> soffice --headless --convert-to html:"XHTML Writer File:UTF8" install.pdf
This command is wrong, and I'm actually surprised it doesn't throw an error. PDF files are associated by default with Draw, but "XHTML Writer File" is an export filter of Writer. Obviously you can't use a Writer filter when exporting a file loaded using Draw...

The solution here is to explicitly set the Writer pdf import filter with --infilter="writer_pdf_import".

Other generic formats (like html, rtf or plain text) can just add "--writer" to the command line, without specifying the input filter name, but this doesn't work for PDF. This can also be reproduced from the UI: Trying to open a pdf file from inside Writer, will still open it Draw.

> This command wirks well, but lacks utf8 support :
> 
> soffice --headless --convert-to html install.pdf
Try --convert-to html:"XHTML Draw File".
Comment 2 b_b 2020-07-09 17:15:20 UTC
Thx for answering, i think we got all info needed to fix the bug in etherpad :)

Feel free to close this ticket.