FILEOPEN I'm dealing with large CSV files (200-500 MB) that were created by exporting from postgres. If there are no CRLFs in the cells then Calc is a little slower than Excel (3:00 min vs 2:28 for a file that is 315 MB) But if there are CRLF in the fields then Calc can take 3-4 times as long to open the file (6:35 min vs 1:40 min for a 182MB file) Win 7 pro service pack 1 LO 5.0.1.2 (although I was experiencing the same issue with vs 4)
Looks a bit like bug 84246 or even bug 82605
(In reply to MM from comment #1) > Looks a bit like bug 84246 or even bug 82605 Not like bug 84246 because there is no crash, it just takes really long. Not like bug 82605 because I'm using " as the delimiter and the default delimiter is already ". The file eventually loads up just fine and as expected, it just takes a REALLY long time. The CSV file in question actually has several cells per row that may have CRLFs.
@john do you have a test file? since it's a big one you should upload it to some webhosting space
I have set the bug's status to 'NEEDINFO', so please do change it back to 'UNCONFIRMED' once you have attached a document.
I've uploaded a sample file here: https://www.dropbox.com/s/9mcnlf367yr1nns/02_MWR_SXx.csv?dl=0 This is a mailing list, typical of what I have to produce for my company. I've overwritten the personal info in the file.
Created attachment 119361 [details] test file
I waited for 17 minutes and then got bored and killed it. Win 7 Pro 64-bit, Version: 5.0.2.2 (x64) Build ID: 37b43f919e4de5eeaca9b9755ed688758a8251fe Locale: fi-FI (fi_FI)
I've not tried reporting an bug to LibreOffice before, so I have no idea what to expect. Is this officially recognized as a bug now? Will someone be looking at fixing it? Is there a timeline on it? I have user's that are complaining to me about it and I'd like to be able tell them something.
(In reply to john cantin from comment #8) > I've not tried reporting an bug to LibreOffice before, so I have no idea > what to expect. welcome on board :-) > Is this officially recognized as a bug now? yes. when you report a bug the status is UNCONFIRMED. when another user is able to reproduce it the status is set to NEW which means that the bug is confirmed. > Will someone be looking at fixing it? Is there a timeline on it? there's no timeline yet. what you can do to speedup the fixing is testing the bug with older releases in order to know if the issue has always been present or if it's a regression bug (it worked fine in a previous release and became a bug in a newer one). so I suggest going at this page: http://sourceforge.net/projects/winpenpack/files/X-LibreOffice/releases/ and download some older LibO portable versions and retest. I suggest testing the last version of each branch (i.e. 4.4.5, 4.3.5, 4.2.6) until you find the first version that doesn't show the bug then move forward and test the first release of each branch (i.e. 4.4.0, 4.3.0, 4.2.0) to find the first release that did show the bug. if you find the regression point it will be easier to identify the root of the issue and have a fix
Migrating Whiteboard tags to Keywords: (perf)
I confirm, still happens (5.0.4.2 release) with rather simple 80 MB CSV file (6 columns). I tried to open (selecting just 3 columns instead of all 6) and for 5 minutes nothing happened. I had to kill soffice.bin as it was consuming whole processor power. I don't understand why most of the programs including LibreOffice are just plain stupid about opening large files. Instead of buffering just some lines (1MB?) (principle known and used in the 1980's in most of the word processors), they try to load the whole thing and are not even to able to do that - crashing or need to be killed.
No need to change version field.
This needs to be retested in current master.
(In reply to raal from comment #6) > Created attachment 119361 [details] > test file Both 5.1 and 5.2 take about 2 min 20 sec to open the file on my very fast computer. Arch Linux 64-bit, KDE Plasma 5 Version: 5.2.0.0.alpha0+ Build ID: 96c1ae1d8e78ae8b9bd7d4001645cad24d62b720 CPU Threads: 8; OS Version: Linux 4.4; UI Render: default; Locale: fi-FI (fi_FI.UTF-8) Built on April 1st 2016 64-bit, KDE Plasma 5 Build ID: 5.1.1.3 Arch Linux build-2 CPU Threads: 8; OS Version: Linux 4.4; UI Render: default; Locale: fi-FI (fi_FI.UTF-8)
The slowdown in the multiline file is related to the row height calculation. We can't skip that so currently I see no way to get that to acceptable performance.
How about, as part of the open csv dialog, give the user the option to skip the automatic row height calculations. The delay makes it completely impossible to use calc on these files.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Created attachment 145104 [details] Callgrind output from master Took a callgrind in case it is of any help. Arch Linux 64-bit Version: 6.2.0.0.alpha0+ Build ID: 0ffa7a733d834647dfd59b864c52a015028822b6 CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: gtk3_kde5; Locale: fi-FI (fi_FI.UTF-8); Calc: threaded Built on September 21st 2018
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/c47d0174f2c6c3ebcb3b33276d0277e7aceac330%5E%21 tdf#94677 Calc is slow opening large CSV, avoid reset SetUpdateMode It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/2b58bb92b3d5da97290a0f273125ebc34fc5082b%5E%21 tdf#94677 Calc is slow opening large CSV, avoid std::shared_ptr It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/31589bf0239679d73417902655045c48c4868016%5E%21 tdf#94677 Calc is slow opening large CSV, improve tools::Fraction It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
it takes real 6m28,995s user 4m26,173s sys 0m4,305s in Version: 6.4.0.0.alpha0+ Build ID: a294457eb95e60028539b6783abac78b56561fe2 CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US Calc: threaded while in Version: 6.3.0.0.beta2+ Build ID: e17e30dceb110e780a7e7e89c2ede854d4bc38a7 CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US Calc: threaded it takes real 10m24,053s user 7m18,434s sys 0m5,371s Note: I was compiling LibreOffice when I measured it
Build from 23 June took 1min 15s (stopwatch time) to open, so we already had a ~1min improvement since my 2min 20s comment 14 in 2016 (probably thanks to other patches by Noel). With a fresh build just now, it takes 43 seconds. So the opening time is ~31% of the time in 2016. A large file is a large file and I think these are pretty substantial improvements, so I would be happy to close this as fixed, unless Noel has other ideas.
I've just checked again without compiling at the same time and it takes real 5m35,114s user 5m25,324s sys 0m5,297s don't know why it takes sooo long for me compares to buovjaga's measurement...
Import time might be slower since last measurement after https://cgit.freedesktop.org/libreoffice/core/commit/?id=c47d0174f2c6c3ebcb3b33276d0277e7aceac330 got reverted in https://cgit.freedesktop.org/libreoffice/core/commit/?id=0e12a4055de19271e8756a323df684c0985c8e3a
(In reply to Xisco Faulí from comment #24) > I've just checked again without compiling at the same time and it takes > > real 5m35,114s > user 5m25,324s > sys 0m5,297s > > don't know why it takes sooo long for me compares to buovjaga's > measurement... It took 8 min 05 sec for me in Version: 7.2.0.0.alpha0+ / LibreOffice Community Build ID: 931e264590100c555580c413556e229a0f03316a CPU threads: 4; OS: Mac OS X 10.16; UI render: default; VCL: osx Locale: ru-RU (ru_RU.UTF-8); UI: en-US Calc: threaded
(In reply to Roman Kuznetsov from comment #26) > It took 8 min 05 sec for me in > > Version: 7.2.0.0.alpha0+ / LibreOffice Community > Build ID: 931e264590100c555580c413556e229a0f03316a > CPU threads: 4; OS: Mac OS X 10.16; UI render: default; VCL: osx > Locale: ru-RU (ru_RU.UTF-8); UI: en-US > Calc: threaded and I got a wrong data in Calc, only 83774 rows instead more than 1 million
Luboš Luňák committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/a85b647f7bb6cb869abf22ab9ecce419ad5083e0 compress calls to AdjustRowHeight() to just one call (tdf#94677) It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Luboš Luňák committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/673a210b73716cf9ceb7b104b38e39987d0515af use SalLayoutGlyphsCache in EditEngine/SvxFont (tdf#94677) It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
1 min 10 sec (very cool time for 170 mb file opening!) in Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community Build ID: 28de720bc088a4afd3b2f28c5136a3478af5d22a CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win Locale: ru-RU (ru_RU); UI: en-US Calc: threaded Intel Core i7-10510U CPU @ 1.80GHz here