Bug 87618 - Invalid sheet dimension written in XLSX format
Summary: Invalid sheet dimension written in XLSX format
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.3.5.2 release
Hardware: Other Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 88106 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-12-22 22:58 UTC by jmorrison
Modified: 2016-09-23 13:22 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Original Excel 2007 file (70.65 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-12-23 20:53 UTC, jmorrison
Details
xlsx file saved by LibreOffice (58.61 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-12-23 20:55 UTC, jmorrison
Details
Excel 2007 file input test (17.81 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-12-23 21:00 UTC, jmorrison
Details
Libreoffice xlsx with problems (86.38 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-12-23 21:04 UTC, jmorrison
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jmorrison 2014-12-22 22:58:51 UTC
1. The original xlsx file from Excel is ten times smaller than after saving in LibreOffice

-rwx------ 1 jmorrison None  150748 Dec 22 14:21 test-original.xlsx
-rwx------ 1 jmorrison None 1439545 Dec 22 14:36 test-saved.xlsx*



2. The original xlsx file is readable by the python script xlsx2csv
git clone https://github.com/dilshod/xlsx2csv
pip install xlsx2csv


xlsx2csv test-original.xlsx 

stuff from spreadsheet,,,,,,,,,,,,,,,,

xlsx2csv test-saved.xlsx

Traceback (most recent call last):
  File "/usr/bin/xlsx2csv", line 847, in <module>
    xlsx2csv.convert(outfile, sheetid)
  File "/usr/bin/xlsx2csv", line 178, in convert
    self._convert(sheetid, outfile)
  File "/usr/bin/xlsx2csv", line 247, in _convert
    sheet.to_csv(writer)
  File "/usr/bin/xlsx2csv", line 558, in to_csv
    self.parser.ParseFile(self.filehandle)
  File "/usr/bin/xlsx2csv", line 660, in handleStartElement
    startCol = start.group(1)
AttributeError: 'NoneType' object has no attribute 'group'


Researching the python error, it seems that Unicode is being returned where UTF-8 is expected

https://stackoverflow.com/questions/15232832/python-regex-attributeerror-nonetype-object-has-no-attribute-groups
Comment 1 Urmas 2014-12-23 06:50:47 UTC
Please attach both files.
Comment 2 jmorrison 2014-12-23 20:53:34 UTC
Created attachment 111238 [details]
Original Excel 2007 file

smaller test input file
Comment 3 jmorrison 2014-12-23 20:55:30 UTC
Created attachment 111239 [details]
xlsx file saved by LibreOffice

Had to redact the original file. Libreoffice version still has output problem with xlsx2csv.
Comment 4 jmorrison 2014-12-23 21:00:51 UTC
Created attachment 111240 [details]
Excel 2007 file input test

removed hidden sheets
Comment 5 jmorrison 2014-12-23 21:04:57 UTC
Created attachment 111241 [details]
Libreoffice xlsx with problems

This libreoffice xlsx file can not be parsed with xlsx2csv while original can be. 

I added more lines to the excel file and the saved libreoffice file is 5x larger.

In a large spreadsheet with hundreds of lines the file size difference is noticable. Excel xlxs file of 140k, LibreOffice was 1.4 MB.
Comment 6 Urmas 2014-12-24 05:41:13 UTC
The problem is caused by this element on the sheet 1.

<dimension ref="1:15"/>

As for the file size, there just has to be a duplicate bug somewhere.
Comment 7 Markus Mohrhard 2015-05-09 22:30:38 UTC
@Eike: Do we have a way to produce OOXML range strings without whole column/whole row references?

According to 18.3.1.35 in the spec this element requires that:

The row and column bounds of all cells in this worksheet. Corresponds to the range that would contain all c elements written under sheetData. Does not support whole column or whole row reference notation.
Comment 8 QA Administrators 2016-09-20 09:42:53 UTC Comment hidden (obsolete)
Comment 9 Bartosz 2016-09-23 13:18:35 UTC
The issue was resolved resolved with LibreOffice 5.3
Comment 10 Bartosz 2016-09-23 13:22:34 UTC
*** Bug 88106 has been marked as a duplicate of this bug. ***