Hi, we have a problem with LO headless PDF to HTML conversion on etherpad project, refs : https://github.com/ether/etherpad-lite/issues/4151 https://github.com/ether/etherpad-lite/issues/4152 With the test file attached to the first issue, this commend generate en empty output : soffice --headless --convert-to html:"XHTML Writer File:UTF8" install.pdf This command wirks well, but lacks utf8 support : soffice --headless --convert-to html install.pdf Also tested with LO 7 as mentioned in https://github.com/ether/etherpad-lite/issues/4152#issuecomment-655531583 Thx for reading :)
(In reply to b_b from comment #0) > With the test file attached to the first issue, this commend generate en > empty output : > > soffice --headless --convert-to html:"XHTML Writer File:UTF8" install.pdf This command is wrong, and I'm actually surprised it doesn't throw an error. PDF files are associated by default with Draw, but "XHTML Writer File" is an export filter of Writer. Obviously you can't use a Writer filter when exporting a file loaded using Draw... The solution here is to explicitly set the Writer pdf import filter with --infilter="writer_pdf_import". Other generic formats (like html, rtf or plain text) can just add "--writer" to the command line, without specifying the input filter name, but this doesn't work for PDF. This can also be reproduced from the UI: Trying to open a pdf file from inside Writer, will still open it Draw. > This command wirks well, but lacks utf8 support : > > soffice --headless --convert-to html install.pdf Try --convert-to html:"XHTML Draw File".
Thx for answering, i think we got all info needed to fix the bug in etherpad :) Feel free to close this ticket.