Bug 66342 - FILTER: Data > Filter > Standard Filter doesn't work for large (> 2^14) data sets
Summary: FILTER: Data > Filter > Standard Filter doesn't work for large (> 2^14) data ...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.5.7.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-28 19:54 UTC by slogger
Modified: 2015-03-06 21:34 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
ODS and screenshot of standard filter dialog from v3572 and v4142. (99.83 KB, application/zip)
2014-01-21 05:32 UTC, Owen Genat (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description slogger 2013-06-28 19:54:57 UTC
Problem description: 
You can't filter more than 2^14 rows correctly.

We've replicated on the current Ubuntu 3.5.7.2 and a Mac version.

Also noticed here:
http://milospjanic.blogspot.com/2011/10/how-to-remove-duplicates-in-libreoffice.html?showComment=1342671113599#c2441335020298867469

Steps to reproduce:
Create a document (CSV, etc.) with > 2^14 rows, a single column of 20k dictionary words is fine. Drag it in to your spreadsheet, and use Data > Filter > Standard filter to try and reduce to Column Not Empty and No Duplication. Even if every word was unique, you'll end up with 2^14 rows.

Current behavior:
Even if every word was unique, you'll end up with 2^14 rows, truncating off lots of valid data.

Expected behavior:
You retain all 20k rows.
              
Operating System: All
Version: 3.5.7.2 release
Comment 1 Owen Genat (retired) 2014-01-21 05:32:25 UTC
Created attachment 92502 [details]
ODS and screenshot of standard filter dialog from v3572 and v4142.

I believe this limitation was fixed for all versions after v3.5.7.2 i.e., 3.6+ series. I have tested behaviour under Ubuntu 10.04 x86_64 using:

- v3.3.0.4 OOO330m19 Build: 6
- v3.4.6.2 OOO340m1 Build: 602
- v3.5.7.2 Build ID: 3215f89-f603614-ab984f2-7348103-1225a5b
- v3.6.7.2 Build ID: e183d5b
- v4.0.6.2 Build ID: 2e2573268451a50806fcd60ae2d9fe01dd0ce24
- v4.1.4.2 Build ID: 0a0440ccc0227ad9829de5f46be37cfb6edcf72

The attached ODS is simply a list of numbers from 1 to 65537 in column A. Highlighting this range and selecting Data > Filter > Standard Filter... > and then pulling down the Value drop-down list reveals two different results. 

Versions 3.3.0.4 through 3.5.7.2 only list entries up to 16385 (as shown in v3572.png screenshot) while versions 3.6.7.2 through 4.1.4.2 only list four distinct entries, 65535, 65536, 1, and 65537 (as shown in v4142.png screenshot).
Comment 2 Owen Genat (retired) 2014-01-21 05:52:32 UTC
(In reply to comment #0)
> Create a document (CSV, etc.) with > 2^14 rows, a single column of 20k
> dictionary words is fine. Drag it in to your spreadsheet, and use Data >
> Filter > Standard filter to try and reduce to Column Not Empty and No
> Duplication. Even if every word was unique, you'll end up with 2^14 rows.

To be clear, I cannot reproduce this under v3.6+. As per comment #1, I supposed this limitation could be confirmed, so I am going to set the Status set to NEW. I am not sure though if this is still a valid bug, given it only appears in earlier versions. May be able to be RESOLVED as FIXED.
Comment 3 raal 2015-03-06 21:34:41 UTC
Tested with LO 4.3.3 and Version: 4.5.0.0.alpha0+
Build ID: 2c0e1917c18711d6762e12042794b745f08cf62f
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2015-03-05_17:29:19

It's fixed, closing as worksforme.