Bug 123476 - Detect 0-byte files based on extension (esp. for MS Office and ODF formats)
Summary: Detect 0-byte files based on extension (esp. for MS Office and ODF formats)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: FormatDetection
  Show dependency treegraph
 
Reported: 2019-02-15 05:19 UTC by Aron Budea
Modified: 2020-06-04 07:54 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2019-02-15 05:19:04 UTC
Currently all empty (0-byte) files are detected as HTML/Web files (this is actually a change in 6.2.0.3 since 6.1.0.3, until then they were detected as text files). Since there's no other identifying information, they should be detected based on extension.

Current behavior is a problem, because in Windows right click -> New -> <various MS Office formats> tend to create 0-byte files with MS Office installed, and opening and editing them in LibreOffice can cause confusion and potential data loss if the user doesn't notice the wrong format before saving their document.
Comment 1 Mike Kaganski 2019-02-15 05:48:17 UTC
Well - then not only detection should be changed, but also something needs to be done with document initialization as well - because *reading* from such a file using detected filter would be impossible.
Comment 3 Mike Kaganski 2019-02-15 07:57:39 UTC
Regarding the initialization: should something specific be done with the new document depending on the filter? E.g., for a 0-byte .docx, should we simply create a new Writer document (using default template) and set its filter to DOCX, or should we initialize it as if that is a DOCX - which would mean different default fonts, compatibility options, etc. (whatever is done in DOCX importer when a valid DOCX is imported, before actual DOCX data is read)? Should all filters be modified to be able to do that then? Would that require to have own default templates for all filters, of should the one default template for the module be used anyway, with application of filter-specific modifications (with a risk of the resulting new document to differ from the template as used in normal new document creation)?
Comment 4 Aron Budea 2019-02-15 08:13:53 UTC
(In reply to Mike Kaganski from comment #1)
> Well - then not only detection should be changed, but also something needs
> to be done with document initialization as well - because *reading* from
> such a file using detected filter would be impossible.
Sure, the point is not to read from an empty file, but to correctly initialize one.
Comment 5 Aron Budea 2019-02-15 12:27:35 UTC
(In reply to Mike Kaganski from comment #3)
> Regarding the initialization: should something specific be done with the new
> document depending on the filter? E.g., for a 0-byte .docx, should we simply
> create a new Writer document (using default template) and set its filter to
> DOCX, or should we initialize it as if that is a DOCX - which would mean
> different default fonts, compatibility options, etc. (whatever is done in
> DOCX importer when a valid DOCX is imported, before actual DOCX data is
> read)? Should all filters be modified to be able to do that then? Would that
> require to have own default templates for all filters, of should the one
> default template for the module be used anyway, with application of
> filter-specific modifications (with a risk of the resulting new document to
> differ from the template as used in normal new document creation)?
All very good questions, I'd say just create an empty document/spreadsheet/presentation, set the export type to the identified one, and do an export+import cycle. Out of that the last step is optional if it'd be a larger task, the most important is to start in the correct application and set the correct save format.

While I think the above method would be applicable to most formats, it's really relevant for the formats that could come in as 0-byte files in real life, ie. the ones that can be created by MS Office via Explorer context menu.