Bug 157640 - Error message "source file could not be loaded" could be more explicit when using import filter calc_pdf_addstream_import on non-hybrid PDF
Summary: Error message "source file could not be loaded" could be more explicit when u...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
7.6.2.1 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF
  Show dependency treegraph
 
Reported: 2023-10-06 15:07 UTC by ruslanik55
Modified: 2023-10-09 19:32 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
pdf for testing purposes (15.07 KB, application/pdf)
2023-10-06 15:07 UTC, ruslanik55
Details
hybrid PDF with embedded ODS (19.56 KB, application/pdf)
2023-10-09 19:32 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ruslanik55 2023-10-06 15:07:56 UTC
Created attachment 190068 [details]
pdf for testing purposes

libreoffice --headless --infilter="calc_pdf_addstream_import" --convert-to xlsx:"Calc MS Excel 2007 XML" Untitledspreadsheet-Sheet1.pdf

Error: source file could not be loaded
Comment 1 m_a_riosv 2023-10-07 00:49:17 UTC
I think that conversion, it's not available.
Comment 2 ruslanik55 2023-10-07 08:57:07 UTC
So I don't have any instruments in command line to do this, even with double or triple conversion?
Comment 4 Stéphane Guillou (stragu) 2023-10-09 14:11:47 UTC
But the filter "calc_pdf_addstream_import" seems to exist, even though it is not exposed in the GUI.

In sdext/source/pdfimport/config/pdf_import_filter.xcu, it has the flags "NOTINFILEDIALOG NOTINCHOOSER"

I only see it used here:

https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/filterdet.cxx?r=9dd0af94#357
Comment 5 ruslanik55 2023-10-09 15:08:53 UTC
For Calc I didn't find another import filter for pdf in documentation.
But anyway I tried different combinations and nothing helped me to convert pdf to xlsx
Comment 6 Stéphane Guillou (stragu) 2023-10-09 19:31:25 UTC
As was mentioned by Mike Kaganski on IRC, the filter is specifically for hybrid PDFs, as shown in line https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/config/pdf_import_filter.xcu?r=0f613adb#197

I think the error message should be better than "file not found". Something like "input file is not a hybrid PDF" would be a whole lot better.
Comment 7 Stéphane Guillou (stragu) 2023-10-09 19:32:14 UTC
Created attachment 190099 [details]
hybrid PDF with embedded ODS

You can test with this PDF as input to see that the filter does work when the right format is supplied.