By importing a document (a HTML file named as .XLS or .HTML) with lots (more than 15000) of rows, OpenOffice Calc truncates data without showing any error or warning. The issue can be reproduced by importing attached file into calc. On my machine it imports just 13635 of the 20000 rows.
Created attachment 44977 [details] 20000 rows as HTML table
The effect is reproducible with reporter's sample document and "LibreOffice 3.3.2 – WIN7 Home Premium (64bit) German UI [OOO330m19 (Build:202 / tag 3.3.2.2)]" I saw a lot of documents with name extension .xls having nothing to do with an EXCEL spreadsheet, the user or his application only used that name because of "somehow table contents". To be honest, I do not know much about EXCEL HTML document, except that it is a mess to work with them. Imho that's an EXCEL problem, EXCEL should create documents with correct syntax. Reporter's sample is no correct html, although source text is pretending to be html. At least html type information is missing. I'ts also not an EXCEL type spreadsheet. MS EXCEL viewer will not open that document. Some other observations: OOo3.1.1. (from open WRITER document) will by default open the document as WRITER-HTML document in writer with correct table view until "A12800", then table view stops and strings from table will be shown as endless plain text line. I can force OOo to open the document as html-calc, then it will open the document as spreadsheet, "E13105" is the latest content shown correctly, then table formatting breaks. Exactly the same with OOo-dev 3.4 My result: My aversion against such documents has nothing to do with the reported problem, LibO should reject the document or open it correctly (may be with a warning message). Low priority, imprtant data should be exported to a document with correct syntax, that's a problem of the application creating such documents. @Marco: You get such documents from what application?
Although the "html" code is completely different, I see something similar to the reported problem with the attachment of OOo bug Bug 111579 - Opening large html excel document from SAS <http://openoffice.org/bugzilla/show_bug.cgi?id=111579> Opening that document with LibO CALC (from WIN Explorer) the last correctly shown cell 'F6712' will have contents "PXXX09.001.AAAA.BBBB 1728". Next cell will be broken, no further contents will be shown, Table ends with date 15/09/2009 Renaming document to .html and opening with Seamonky shows: there is much ocntents behind "15/09/2009"
(In reply to comment #3) > Although the "html" code is completely different, I see something similar to > the reported problem with the attachment of OOo bug > Bug 111579 - Opening large html excel document from SAS > <http://openoffice.org/bugzilla/show_bug.cgi?id=111579> > Opening that document with LibO CALC (from WIN Explorer) the last correctly > shown cell 'F6712' will have contents "PXXX09.001.AAAA.BBBB 1728". Next cell > will be broken, no further contents will be shown, Table ends with date > 15/09/2009 > > Renaming document to .html and opening with Seamonky shows: there is much > ocntents behind "15/09/2009" Yes I agree, it seems to be same issue.
(In reply to comment #3) > Although the "html" code is completely different, I see something similar to > the reported problem with the attachment of OOo bug > Bug 111579 - Opening large html excel document from SAS > <http://openoffice.org/bugzilla/show_bug.cgi?id=111579> > Opening that document with LibO CALC (from WIN Explorer) the last correctly > shown cell 'F6712' will have contents "PXXX09.001.AAAA.BBBB 1728". Next cell > will be broken, no further contents will be shown, Table ends with date > 15/09/2009 > > Renaming document to .html and opening with Seamonky shows: there is much > ocntents behind "15/09/2009" The .XLS extension is used for users convenience - as those extensions are associated with LibreOffice or MS Excel by default. Trying with MS Excel 2010, it imports that example file without a problem. It just showed a warning that it's not an Excel file. Such files are generated by applications which cannot create native .XLS (or .XLSX). The example file is one I was creating manually to demonstrate the issue. However, the main issue I see here is that LibreOffice cannot import huge HTML tables. It should either import the whole data or show warning message.
I can confirm this bug too in libreoffice 3.4.2. Happens for me on slightly less huge tables with around 3000 rows. The interesting thing is, that borders of the table are rendered to the last row, but data are truncated randomly in each file somewhere in the middle.
[This is an automated message.] This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it started right out as NEW without ever being explicitly confirmed. The bug is changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases. Details on how to test the 3.5.0 beta1 can be found at: http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1 more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
The issue is still open and reproducible with "3.5.0 beta2".
Issue is still reproducible under v3.5.7.2 (Ubuntu v10.04 x86_64) and v4.0.1.2 (Win7).
Working on this. The limit is around ~64k data cells, imposed by some underlying structures used during import.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=2af1f5691e8d64afd5246d245d7876b5a2cd5cd8 resolved fdo#35756 import more than 64k HTML table cells The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 64168 has been marked as a duplicate of this bug. ***
*** Bug 64572 has been marked as a duplicate of this bug. ***
*** Bug 60354 has been marked as a duplicate of this bug. ***
Backport pending review for 4-0 as https://gerrit.libreoffice.org/4368
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-4-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=da11528150df545a31df3c9863bd4c3925ccdf21&h=libreoffice-4-0 resolved fdo#35756 import more than 64k HTML table cells It will be available in LibreOffice 4.0.5. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.