Bug 142986 - In "Data - Statistics - Sampling" options, not possible to use a sampling size greater than 100
Summary: In "Data - Statistics - Sampling" options, not possible to use a sampling siz...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.3 all versions
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:7.3.0 target:7.2.3
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Data-Statistics
  Show dependency treegraph
 
Reported: 2021-06-22 14:18 UTC by Antonio J Caba
Modified: 2021-11-18 16:41 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample of the results. (21.05 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-06-22 14:23 UTC, Antonio J Caba
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Antonio J Caba 2021-06-22 14:18:25 UTC
Description:
When using the Data - Statistics - Sampling option, I select a range (for example from Sheet1.$A$2:$A1001) and select sample of 218 rows, in version 6.1.5.2 it shows all 218 rows. As of version 6.4.0.3 or after, they only display 100 records.

Steps to Reproduce:
1. Source sheet with 1000 rows
2. A destination sheet where I will place the result.
3. Select input range (all of the 1000 rows)
4. Select the destination (first cell on the destination sheet)
5. Select 218 at the random method.

Actual Results:
On version 6.1.5.2 I obtain 218 rows
On version 6.4.0.3 I obtain 100 rows

Expected Results:
The same numbers of results that I want, not only 100.


Reproducible: Always


User Profile Reset: No



Additional Info:
Here works fine:
Versión: 6.1.5.2
Id. de compilación: 90f8dcf33c87b3705e78202e3df5142b201bd805
Subprocs. CPU: 4; SO: Windows 6.1; Repres. IU: predet.; 
Configuración regional: es-ES (es_ES); Calc: CL

Here doesnt work:
Versión: 6.4.0.3 (x86)
Id. de compilación: b0a288ab3d2d4774cb44b62f04d5d28733ac6df8
Subprocs. CPU: 4; SO: Windows 6.1 Service Pack 1 Build 7601; Repres. IU: predet.; VCL: win; 
Configuración regional: es-ES (es_ES); Idioma de IU: es-ES
Calc: threaded
Comment 1 Antonio J Caba 2021-06-22 14:23:33 UTC
Created attachment 173087 [details]
Sample of the results.

Sheet 2 and 3 must have 218 rows of data
Sheet 1 - Source data
Sheet 2 - 218 rows with version 6.1.5.2
Sheet 3 - 100 rows with version 6.4.0.3 (x86)
Comment 2 Xisco Faulí 2021-06-22 15:37:34 UTC
Regression introduced by:

author	Eike Rathke <erack@redhat.com>	2018-12-22 22:06:02 +0100
committer	Eike Rathke <erack@redhat.com>	2018-12-22 23:28:11 +0100
commit 2c5c20b19c349a4b7f6d78d69d8d57f9af5c351c (patch)
tree 3547c192bffd1cdf638d203df50db55fc5dbd99f
parent 28a1ae3285aad77e238c698bd8d496006c881f6d (diff)
Rework Data -> Statistics dialog, add WithReplacement and KeepOrder

https://cgit.freedesktop.org/libreoffice/core/commit/?id=2c5c20b19c349a4b7f6d78d69d8d57f9af5c351c

Bisected with: bibisect-linux64-6.3

Adding Cc: to Eike Rathke
Comment 3 Stéphane Guillou (stragu) 2021-06-22 22:24:10 UTC
Whatever settings are used (as long as there are enough values to pick from according to settings), if the sample size is bigger than 100, it switches back to a maximum of 100.

Minimal reproducible steps from scratch:

1. Open Calc
2. Fill a range with numbers (for example the sequence 1 to 10 in range A1:A10)
3. Select range with numbers
4. Menu: Data > Statistics > Sampling...
5. The "Input range" should be the one previously selected
6. Results to: B1
7. Sampling method: Random
8. Tick "With replacement" (so numbers in the sample can be reused)
9. Input a Sample size greater than 100 (for example 105)
10. Click out of the Size field (by clicking into another field, or clicking OK) 

Actual results:
The sample size is reset to a maximum of 100.

Expected results:
Any number is kept, as "With replacement" is ticked and therefore the sample size is not limited.

Confirmed with:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: e3086b58eb5427d520b86c185f9d911bb6f7a3a0
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-21_15:37:11
Calc: threaded

And:

Version: 7.2.0.0.beta1 / LibreOffice Community
Build ID: c6974f7afec4cd5195617ae48c6ef9aacfe85ddd
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 4 Justin L 2021-11-10 07:23:12 UTC
Easy fix. Must has slipped past Eike's radar.
http://gerrit.libreoffice.org/c/core/+/124948
Comment 5 Justin L 2021-11-10 08:01:47 UTC
My minimum steps to reproduce:
1.) open Muestreo.ods and switch to Hoja1 tab
2.) Ctrl-shift-home to select the from A1171 to A1
3.) Data - Statistics - Sampling and select Random sampling method
4.) type 10,000 in the sample size.
5.) click in "Results to:" and see what happens to sample size.

It was changing to 100.  With the patch in place it changes to 1171.
Comment 6 Commit Notification 2021-11-11 12:59:29 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/eb50d356ffbe5bd2e3de9ac574ddf28ce4e034ad

tdf#142986 sc sampling: allow more than default (100) samples

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2021-11-11 15:06:26 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fd06044c0178bb7724a735c233aa698e2dcea096

tdf#142986: sc: Add UItest

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2021-11-11 15:27:51 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-7-2":

https://git.libreoffice.org/core/commit/62d82a9a4108f715cf1a2bf15205adc28e9fd31a

tdf#142986 sc sampling: allow more than default (100) samples

It will be available in 7.2.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2021-11-13 05:35:22 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-7-2-3":

https://git.libreoffice.org/core/commit/b89853d017f862de9dfadcf9efdbcf7d68feb945

tdf#142986 sc sampling: allow more than default (100) samples

It will be available in 7.2.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.