Bug 60889 - FILEOPEN: Incorrect opening XLSX file (that happens to be not valid according to OOXML, may be produced by software like 1C:Enterprise)
Summary: FILEOPEN: Incorrect opening XLSX file (that happens to be not valid according...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: Other All
: high major
Assignee: Not Assigned
URL:
Whiteboard: target:4.2.0
Keywords: difficultyBeginner, easyHack, skillCpp
: 67037 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-02-15 10:21 UTC by Draga
Modified: 2015-12-15 16:48 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
Incorrect XLSX file and screenshots in PDF (150.66 KB, application/x-zip-compressed)
2013-02-15 10:21 UTC, Draga
Details
Renamed to sharedStrings.xml - works fine (15.14 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-02-19 10:15 UTC, Thomas Arnhold
Details
Initial patch (908 bytes, patch)
2013-08-02 01:09 UTC, rfw2nd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Draga 2013-02-15 10:21:29 UTC
Created attachment 74864 [details]
Incorrect XLSX file and screenshots in PDF

Problem description: 
I have an XLSX-file which opens incorrectly.
Absent text data in cells, numbers are displayed. MSO Excel 2007 opens the file correctly.
I don't know the creator application.
Screenshots are in PDF-file.

Steps to reproduce:
1. Open test.xlsx

Current behavior:
See test.pdf page 2

Expected behavior:
See test.pdf page 1

Operating System: All
Version: 4.0.0.3 release
Comment 1 Thomas Arnhold 2013-02-15 12:40:36 UTC
Right, when opening your original xlsx file with Libreoffice all texts are suppressed at import. MS Excel 2010 does fine.

But: Open the file with Excel 2010 and write something in a cell. Save the document and open it in Libreoffice. Voila! All text ist properly imported. Maybe the original document is corrupted?!

Tested with: Version 4.0.0.3 (Build ID: 7545bee9c2a0782548772a21bc84a9dcc583b89) on Windows 7.
Comment 2 Draga 2013-02-19 07:46:07 UTC
OK, I checked it. But...
Excel opens this file but LibO - not. I don't know document is corrupted or not (I don't know XLSX specification). 
Now I know what some file may be opened without some data. This is not a test without Excel. It's bug.
Formatting errors, it is not a problem. Data loss is a problem.
Comment 3 Thomas Arnhold 2013-02-19 10:14:53 UTC
The solution couldn't be more simple:

In the xlsx file there is a file named sharedStrings.xml. In your file it is uppercase SharedStrings.xml, so libo doesn't recognize it. MS Office seems to be more generous with this, so opening it with Excel is no problem. I renamed the file to the lowercase variant and zipped the folder. Voila! It opens correctly with libo.

And you're right, it may be a random case, but we should be as generous as Excel is.

Kohei, Markus, what's your opinion to this?
Comment 4 Thomas Arnhold 2013-02-19 10:15:31 UTC
Created attachment 75102 [details]
Renamed to sharedStrings.xml - works fine
Comment 5 Draga 2013-02-19 10:55:58 UTC
Сonfirm
Verified with LibO 4.0.0.3 and OOOpro 3.2.1
Also I verified original file (without changing the text data to the word 'test').
Comment 6 Markus Mohrhard 2013-02-20 22:02:22 UTC
Lets start with the easy part. The file is not valid according to OOXML.

In xl/_rels/workbook.xml.rels it references sharedStrings.xml in contrast to the SharedStrings.xml that it actually contains. I checked that we respect the value in the file so we are conformant with the standard in this matter.

Now to the question if we should support these broken documents. I think it would be relatively easy to check for files named any version of sharedstrings.xml when the import of the file mentioned in the relationship file fails.

However I would give this a low priority for now and make it an easy hack.

The place in the code where we try to load the shared strings table is:
http://opengrok.libreoffice.org/xref/core/sc/source/filter/oox/workbookfragment.cxx#214

We would need to change the code that if we don't find the file in this case we check if any of variation of sharedStrings.xml is found. Anybody interested in this task can ask me for more details.
Comment 7 Efe Gürkan YALAMAN 2013-02-25 01:37:55 UTC
Hi,

I want to work on this bug. 

I couldn't find which part of the code checks the path of "sharedStrings.xml". I assume the code checks the path case sensitive and we can make is case insensitive. But this is not a proper solution I think. Proper solution is checking variations only in case we couldn't find the "sharedStrings.xml" as Markus said.

Where should i look at for the solution?

Thanks and sorry for my possible grammar mistakes :)
Comment 8 Thomas Arnhold 2013-02-25 04:55:03 UTC
Here's an entry point: http://opengrok.libreoffice.org/search?project=core&q=sharedStrings
Comment 9 Markus Mohrhard 2013-02-25 05:43:56 UTC
(In reply to comment #7)
> Hi,
> 
> I want to work on this bug. 

Great.

> 
> I couldn't find which part of the code checks the path of
> "sharedStrings.xml". I assume the code checks the path case sensitive and we
> can make is case insensitive. But this is not a proper solution I think.
> Proper solution is checking variations only in case we couldn't find the
> "sharedStrings.xml" as Markus said.
> 
> Where should i look at for the solution?
> 
> Thanks and sorry for my possible grammar mistakes :)

So it is more difficult than that. The shared strings don't have to be saved into sharedStrings.xml or any other variation of that string. OOXML defines relationships for this and stores their the name of the file. As long as the name of the file and entry in the relation are equal the file is valid. What we are now trying to do is to accept that there are OOXML generators out there that produce invalid OOXML files which most likely use a variation sharedStrings.xml.

So at the place that I mentioned in Comment 6 we use the path saved in the relations table to finally import the file and we need to check there if this file exists. So you should start by starting Libreoffice in a debugger and set a breakpoint at that place and inspect the stacktrace to see if there is a way to get a list of all the files.

Please feel free to ask for more details if you need them.
Comment 10 Cao Cuong Ngo 2013-03-27 16:45:30 UTC
Should we fix only the case of sharedStrings or all the relationships?
Comment 11 Markus Mohrhard 2013-03-27 17:26:18 UTC
(In reply to comment #10)
> Should we fix only the case of sharedStrings or all the relationships?

Please start with sharedStrings. As long as we don't get any bug reports for other relationships I don't want to add more complexity for them.
Comment 12 Markus Mohrhard 2013-05-09 18:47:12 UTC
*** Bug 49840 has been marked as a duplicate of this bug. ***
Comment 13 Urmas 2013-05-10 05:33:45 UTC
These ones come apparently from some accounting software like 1C:Enterprise.
Comment 14 Draga 2013-07-11 05:52:59 UTC
Probably, incorrect XLSX file (attached) was created by 1C-Enterprise. 1C-Enterprise has a error when exporting to XLSX. I will try to inform the 1C about this error.

Is there a progress in correction this bug in LibO?
Comment 15 Emerson Prado 2013-07-11 20:28:25 UTC
(In reply to comment #6)
> Lets start with the easy part. The file is not valid according to OOXML.
> 
...
> 
> Now to the question if we should support these broken documents. I think it
> would be relatively easy to check for files named any version of
> sharedstrings.xml when the import of the file mentioned in the relationship
> file fails.
> 
...

Technically speaking, the software which generated an invalid string is at fault and should be fixed instead. But users just need to open such files, most of the times not knowing they are corrupted or which app generated them. That, of course, to a certain extent - no software should be asked to open garbage. A comprehensive - but hard - way of doing it would be to check for the correct string first. Then, if not found, search for obvious variations, maybe case-insensitive. if found, inform the user that the file is corrupt and ask what to do - abort, open anyway or fix it. This would ruin the "easy hack" game, I know, just wanted to add my 2¢. For now, "just open" should work.
Comment 16 ign_christian 2013-07-18 12:18:25 UTC
*** Bug 67037 has been marked as a duplicate of this bug. ***
Comment 17 Vitaly Bevsky 2013-07-19 07:29:01 UTC
If the file is not according to OOXML, then why LO opens it without any messages?
IMHO the best way is show the message while loading that file is corrupted and offer the user to fix it.
Comment 18 rfw2nd 2013-08-01 01:18:35 UTC
Just taking a wild guess here... feel free to shoot this comment down at will :).

If the error is only in the case sensitivity (SharedStrings.xml vs sharedStrings.xml), could it be that case-sensitivity wasn't a design concern for OOXML? (Windows filesystems, the registry afaik are also case insensitive)  In which case, there could be (and likely are) other implementations of OOXML that produce files incompatible w/ LO in this regard.
Comment 19 rfw2nd 2013-08-02 01:09:02 UTC
Created attachment 83494 [details]
Initial patch

Here's an initial patch that just fixes it for this particular case.
Comment 20 Draga 2013-08-27 09:38:07 UTC
When LO will accept this patch?
Comment 21 Thomas Arnhold 2013-08-27 12:39:38 UTC
Draga, rfw2nd: Please submit this patch to gerrit, so developers can recognize this. Within bugzilla it's doomed to get lost in the noise...
Comment 22 Juliette Dorleans 2013-09-01 17:56:52 UTC
Hi,

it's not lost in the noise yet ;-)
I reproduced the bug and then applied the patch. I confirm that it fixes the issue. Could it be possible to review this patch ?
Comment 23 Thomas Arnhold 2013-09-02 20:39:42 UTC
I've uploaded it to https://gerrit.libreoffice.org/#/c/5763/

rfw
Comment 24 Thomas Arnhold 2013-09-02 20:41:26 UTC
rfw2nd: Please mail me if you want your real name in your patch (at gerrit). Please follow the license statement procedure at https://wiki.documentfoundation.org/Development/Developers so we can push that fix to master.

Thanks!
Comment 25 rfw2nd 2013-09-02 20:49:32 UTC
Licence statement sent. :)


On Mon, Sep 2, 2013 at 1:41 PM, <bugzilla-daemon@freedesktop.org> wrote:

>   *Comment # 24 <https://bugs.freedesktop.org/show_bug.cgi?id=60889#c24>on bug
> 60889 <https://bugs.freedesktop.org/show_bug.cgi?id=60889> from Thomas
> Arnhold <thomas@arnhold.org> *
>
> rfw2nd: Please mail me if you want your real name in your patch (at gerrit).
> Please follow the license statement procedure athttps://wiki.documentfoundation.org/Development/Developers so we can push that
> fix to master.
>
> Thanks!
>
>  ------------------------------
> You are receiving this mail because:
>
>    - You are on the CC list for the bug.
>
>
Comment 26 Commit Notification 2013-09-23 02:22:44 UTC
Raymond Wells committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=571f49ffed792095fd41e2d07dbe30befa99a5b8

fdo#60889: FILEOPEN: Incorrect opening XLSX file (sharedStrings.xml)



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 28 Thomas Arnhold 2013-09-29 09:03:23 UTC
Draga, you tested the 4-1 branch. Works fine on master and should be included in the next 4.2 release.

Works fine with

Version: 4.2.0.0.alpha0+
Build ID: 164b6ce7b27c0a9ec19019e7b078b9f8f382007d
TinderBox: Win-x86@39, Branch:master, Time: 2013-09-28_16:39:42

So marking this as fixed.
Comment 29 Draga 2013-09-30 05:52:52 UTC
WinXp SP3 
http://dev-builds.libreoffice.org/daily/master/Win-x86@39/current/
master~2013-09-30_01.41.28_LibreOfficeDev_4.2.0.0.alpha0_Win_x86.msi 30-Sep-2013 03:45

Start "C:\Program Files\LibreOfficeDev 4\program\soffice.exe"
Error message:
... not found msvcr110.dll ...

I installed the Visual C++ Runtime VS2012 (vcredist_x86.exe)

Now new error message (my translate from Russian):
 
soffice.exe - Entry point not found
The procedure entry point GetTickCount64 could not be located in the DLL KERNEL32.DLL

What else should I install?
Comment 30 Draga 2013-09-30 06:26:12 UTC
WinXp SP3 
http://dev-builds.libreoffice.org/daily/master/Win-x86@39/current/
master~2013-09-30_01.41.28_LibreOfficeDev_4.2.0.0.alpha0_Win_x86.msi 30-Sep-2013 03:45

Start "C:\Program Files\LibreOfficeDev 4\program\soffice.exe"
Error message:
... not found msvcr110.dll ...

I installed the Visual C++ Runtime VS2012 (vcredist_x86.exe)

Now new error message (my translate from Russian):
 
soffice.exe - Entry point not found
The procedure entry point GetTickCount64 could not be located in the DLL KERNEL32.DLL

What else should I install?


http://msdn.microsoft.com/en-us/library/windows/desktop/ms724411%28v=vs.85%29.aspx
MSDN about GetTickCount64: Minimum supported client - Windows Vista
LO 4.2:  Minimum supported client - Windows Vista ?
Comment 31 Urmas 2013-09-30 07:08:42 UTC
Master builds are not compatible with XP. That's OK.
Comment 32 Robinson Tryon (qubit) 2015-12-15 16:48:03 UTC
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyBeginner SkillCpp)
[NinjaEdit]