Bug 56910 - FILEOPEN Calc text import dialog preview with non-default separators initializes too long
Summary: FILEOPEN Calc text import dialog preview with non-default separators initiali...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.6.0.4 release
Hardware: All All
: medium major
Assignee: Eike Rathke
URL:
Whiteboard: target:6.2.0 target:6.1.0.1
Keywords: bibisected, haveBacktrace, perf, regression
Depends on:
Blocks: CSV-Dialog
  Show dependency treegraph
 
Reported: 2012-11-09 07:26 UTC by Jesus
Modified: 2021-08-16 23:37 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
.csv file 1.5 mb (1.41 MB, text/csv)
2012-11-09 07:26 UTC, Jesus
Details
Callgrind output from 6.0 (3.45 MB, application/x-xz)
2017-09-25 18:40 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jesus 2012-11-09 07:26:40 UTC
Created attachment 69789 [details]
.csv file 1.5 mb

I have a normal .csv file that imports instantly and correctly on Libre office version 3.5.5 (default Mint 13 64 bit) and 3.5.7. Tested with both these versions.

Tested in versions 3.6.3 and 3.6.2 , the same .csv takes 10 times as long to open and imports the data incorrectly.( Fields are all wrong and scattered)

Attached is the .csv
Comment 1 Urmas 2012-11-11 05:54:12 UTC
Corrected description.

Also preview area looks buggy, showing corrupted records somewhere from the middle.
Comment 2 m_a_riosv 2012-11-20 00:49:52 UTC
Win7x64 Ultimate
Version 3.6.4.1 (Build ID: a9a0717)

Open without any problem,
Space as separator.
The first four columns as text.
The last column as number US.
Comment 3 Jesus 2012-11-20 06:19:44 UTC
I think it might be an issue with the linux version - seeing that in Windows it works fine.
Comment 4 Urmas 2012-11-28 22:41:50 UTC
No, this is a cross-platform problem.
Also, you don't have spaces chosen as separators by default, which is causing this.
Comment 5 QA Administrators 2015-07-18 17:43:00 UTC Comment hidden (obsolete)
Comment 6 Buovjaga 2015-10-16 20:18:03 UTC
Confirmed long initialization of preview.

Win 7 Pro 64-bit Version: 5.1.0.0.alpha1+
Build ID: 186f32f63434e16ff5776251657f902d5808ed3d
TinderBox: Win-x86@39, Branch:master, Time: 2015-10-16_09:42:47
Locale: en-US (fi_FI)
Comment 7 Robinson Tryon (qubit) 2015-12-09 18:08:33 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2017-09-01 11:16:12 UTC Comment hidden (obsolete)
Comment 9 Xavier Van Wijmeersch 2017-09-17 10:02:50 UTC
Confirmed long initialization of preview with

Version: 5.4.2.0.0+
Build ID: 61d85c4e7c30ea0f5242d927b7456190020b4fbe
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; 
Locale: nl-BE (en_US.UTF-8); Calc: group

but not with

Version: 6.0.0.0.alpha0+
Build ID: 41b7713334351d7cc455eef5241bd3988b9d1e94
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-09-13_22:56:21
Locale: nl-BE (en_US.UTF-8); Calc: group
Comment 10 Buovjaga 2017-09-25 18:40:55 UTC
Created attachment 136530 [details]
Callgrind output from 6.0

Still repro even with master. I compared 3.5 vs. master in Windows and 3.5 was definitely faster.

If you repro multiple times with the same version, note that it remembers the chosen separator after import. The delay is only when Spaces are not chosen upon the summoning of the import dialog.

Callgrind taken with:

Arch Linux 64-bit, KDE Plasma 5
Version: 6.0.0.0.alpha0+
Build ID: a102f56123b5209a7dfaf33ba001433ec39d279f
CPU threads: 8; OS: Linux 4.12; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on September 25th 2017
Comment 11 Buovjaga 2018-06-28 11:22:13 UTC
Bibisect with Linux 43all points to range https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=d31997559adac6f03d932cb6c5819149c38c1398...1856186951a70a0bcac4e0c3632ca4afe68c05e3

Out of which Eike's commit stands out: https://cgit.freedesktop.org/libreoffice/core/commit/?id=7928b651965f747b02593d2a9fc73fac7c86dbf5

commit 7928b651965f747b02593d2a9fc73fac7c86dbf5 (patch)
tree 079cabd464d84456fc63a44849c402fee1ccd65b
parent 95cc5de63b20c5986fe8f3913da86002eabd7cb1 (diff)
resolved fdo#48621 better handling of broken CSV files
* non-escaped (not doubled) quotes in quoted strings are regarded as broken
  representation and are taken literally, only a quote followed by a separator
  ends a field. If not being a separator themselves, trailing blanks between
  the ending quote and the separator are ignored, complementary to leading
  blanks between a separator and a quote.
* quotes in a non-quoted string are taken literally

Maybe a trade-off between speed and quality? :) Eike is already in CC
Comment 12 Eike Rathke 2018-06-28 14:00:25 UTC
So yes, with Space not being set as separator for this data there's no field separation and the "relaxed quoting rules" try to combine lines for as the "one field" is regarded as broken/mis-quoted CSV data. Analyzing that mess takes more time than if the correct separator was already selected.

Maybe some heuristic analysing the first two lines to detect a possible separator if none encountered could help.
Comment 13 Commit Notification 2018-07-02 14:24:10 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c807e7ea7a0725a4d8375eda07d6f70870e0d50a

Resolves: tdf#56910 detect a Space (blank) separator if not selected

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Eike Rathke 2018-07-02 14:25:09 UTC
Pending Jenkins for 6-1 https://gerrit.libreoffice.org/56814
Comment 15 Buovjaga 2018-07-02 15:51:15 UTC
Preview is now fast, cheers.

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: c807e7ea7a0725a4d8375eda07d6f70870e0d50a
CPU threads: 8; OS: Linux 4.17; UI render: default; VCL: gtk3; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group threaded
Built on July 2nd 2018
Comment 16 Commit Notification 2018-07-02 15:56:44 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-6-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=33b7319e2e08812a2f7d3126e4b1ec90875d6165&h=libreoffice-6-1

Resolves: tdf#56910 detect a Space (blank) separator if not selected

It will be available in 6.1.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.