Created attachment 115148 [details] file-5.14/magic/Magdir/msooxml DOCX and XLSX files saved with MS Office 2007 / 2010 correctly detect using the UNIX file command. Saved with LO, these detect as standard Zip files. $ file *.*x SavedWith2007.docx: Microsoft Word 2007+ SavedWith2007.xlsx: Microsoft Excel 2007+ SavedWithLO.docx: Zip archive data, at least v2.0 to extract SavedWithLOLinux.docx: Zip archive data, at least v2.0 to extract SavedWithLOLinux.xlsx: Zip archive data, at least v2.0 to extract SavedWithLO.xlsx: Zip archive data, at least v2.0 to extract Tested with LO 4.4.2.2 on Windows and Linux. My guess is that the files within the ZIP created by LO are not in the same order as MS Office. Attached is the relevant source file from Ubuntu's 14.04 release, detailing the detection logic of the file command. (file 5.14)
Hello, works correct in ubuntu 15.10 and LO Version: 5.0.0.0.alpha1+ Build ID: f0edb677f09ad338e22ac3b5d91497b4479e0b3c TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2015-04-27_00:34:40 LO437.xlsx: Microsoft OOXML LO5.xlsx: Microsoft OOXML off2010.xlsx: Microsoft Excel 2007+ file --version file-5.19 File saved from LO are detected as Microsoft OOXML. Closing as worksforme.
Hi Raal, Thanks for your quick reply and your efforts in further testing. However, I find myself disagreeing with your eventual conclusion. I have looked at the version of the file command you used, and the deficiency in LibreOffice has been (partially) worked around by the newer file command. It still shows a discrepancy between MS and LO DOCX files which is causing interoperability problems. The newer file command is still unable to determine the type of file (ie. Word, Excel) as your testing shows. The new version of the "magic" msooxml file was downloaded from here http://archive.ubuntu.com/ubuntu/pool/main/f/file/file_5.19.orig.tar.xz and I have attached it for convenience. Of particular note is the comment near the top of the file: # archive. The first member file is normally "[Content_Types].xml". # but some libreoffice generated files put this later. Perhaps skip # the "[Content_Types].xml" test? While I'm sure we both agree this is a relatively minor issue, it does show a deficiency in LibreOffice. It leaves me wondering if it would be an easy fix, but I have no knowledge of the LO codebase. Either way, I would like the status changed from RESOLVED/WORKSFORME as I believe the bug is still valid. Whether it's worth fixing isn't for me to say, but clearly there is a problem.
Created attachment 115171 [details] file-5.19/magic/Magdir/msooxml
Setting to unconfirmed for others comment. The main point as I understand is: # archive. The first member file is normally "[Content_Types].xml". # but some libreoffice generated files put this later. Does ooxml file format specify order of files in zip container?
Hi Raal, thanks for changing the status :) I'm a little out of my depth now, but I had a look at the ECMA-376 4th edition Part 2 available here: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Fourth%20Edition,%20Part%202%20-%20Open%20Packaging%20Conventions.zip On page 107 (PDF page 115) ID S2.5 is possibly relevant: "The package implementer should store pieces in their natural order for optimal efficiency." although this is only a recommendation. While Microsoft rarely sets the benchmark in terms of efficiency, I would tend to agree in this isolated incident that the file describing the other files in an archive should appear first :) On page 112 (PDF 120) ID O3.1 also seems somewhat relevant.
I can reproduce the problem with Word .docx Files saved with the latest dev build libreoffice-5-0~2015-11-16_13.06.56_LibreOfficeDev_5.0.4.0.0_Linux_x86-64 on Debian 8.0: $ file *x Excel.xlsx: Microsoft OOXML PowerPoint.pptx: Microsoft PowerPoint 2007+ Word.docx: Zip archive data, at least v2.0 to extract Excel/PwrPnt seems not to be affected of this problem.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.1.6 or 5.2.3 https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug-20170103
Dear Dan, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
A version of this problem exists for .pptx files generated by current LO 7.0.1.2, but not for the .docx and .xlsx files I rapidly generated and tested. On my up-to-date Arch Linux system, on LO-generated .pptx files the file command recognizes OOXML, but file -i (using different data / a different algorithm) returns application/octet-stream. Therefore, the latter is also returned by xdg-mime. While this may well be rather a file bug, it appears to be within LO's powers to fix (or work around) this by making the order of the top-level entries in the zip match MS-Office-generated files. I have successfully done this on my test .pptx files, by unzipping and recomposing it thus: zip -rD file.pptx [Content_Types].xml _rels ppt docProps For the resulting file.pptx, the file command and thus xdg-mime return the correct MIME type.
FWIW, I just encountered a .docx, apparently generated by LibreOffice 6.0.7.3, that suffers the same problem and is solved in the same fashion.
Dear Dan, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
This bug (wrong order of [Content_Types].xml, _rels, and data in docx, pptx ...) is still present in LO 7.3 LO 7.4 and LO 7.5 See the differences in zipinfo output of the genuine MS word file and LO docx or pptx export. You could see not only different order, but also different compression alg and more. Note: I wrote a script for extracting some embeded data and this bug causes me a lot of pain.