When I open a big csv file about 500MB, and try to save as excel 2007 (xlsx), calc grow fast on memory RAM, CPU almost 100%, status bar of saving task doesn't change, all the system start to slowdown until it freezes completely. It seems its trying to convert and compress the whole file on memory, and then it writes the file. The process should be done on steps of smaller chunks of the data.
we need a test file to reproduce
Created attachment 107355 [details] simple bash script to generate a large CSV This script writes out records like: > 1,2014-10-05 19:15:28.174189280+11:00,7jStH8bW5iMk... i.e., a sequence number, date-time stamp, and 4096 bytes of random base64 data. The script takes a single number (iterations/records) as a parameter: 10000 generates a file of ~52MB, while 100000 generates a file of ~525MB. It is no doubt horribly inefficient, but effective. It is clear from testing with files created by this script that there are limits to the ability to handle large CSV files. v4.3.2.2 crashes trying to load a CSV with 100000 records. The same version eventually loads a CSV with 90000 records (~470MB) but then crashes trying to save the loaded data as XLSX. Around 2.5GB of RAM is used by LO during this process.
I am not sure whether there is anything the developers can do about handling CSV files of this magnitude, but for now confirmed that there is an issue. Status set to NEW. Component set to Spreadsheet. Summary amended for clarity.
@Marcello: How much RAM do you have on your machine? Best regards. JBF
I have 4GB RAM, processor Intel Core i5-3230M 2.60GHz, running elementary OS 0.2.1 64-bit
(In reply to Owen Genat from comment #2) > Around 2.5GB of RAM is used by LO during this process. This was a generalisation. (In reply to Marcelo from comment #5) > I have 4GB RAM, I think the machine is likely running out of RAM. Further test results using the provided script under GNU/Linux with v4.2.6.3: On a system with 3708MB RAM, no swap. CSV records/MB XLSX MB Peak RAM VIRT/RES[1] -------------- ------- -------------------- 30000/~157 ~119 2046/1.1 40000/~210 ~159 2378/1.6 50000/~262 ~200 2871/2.0 60000/~315 ~238 3268/2.3 70000/~367 ~278 3563/2.6 75000/~394 ~298 3750/2.8 80000/~420 N/A 3943/2.9[2] On a system with 7941MB RAM, no swap. CSV records/MB XLSX MB Peak RAM VIRT/RES[1] -------------- ------- -------------------- 100000/~525 ~397 4816/3.7 [1] Values of virtual (MB) and resident (GB) usage displayed by the top command. [2] At this point Calc crashes.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.0.1 or preferably 5.0.2.2 or later) https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for your help! -- The LibreOffice QA Team This NEW Message was generated on: 2015-10-14
*** Bug 103267 has been marked as a duplicate of this bug. ***
I'm not sure what people expect. Just storing the 500 MB string would require us to use 500 MB of memory. There is not yet anything additional that we have allocated then for this data like cell information, ... If you need to process 500 MB csv files in a spreadsheet you need quite some memory. There is no way around it.
This IS still a Bug. Have same issue with a 237MB large CSV file on xubuntu 16.04LTS 64bit running on an intel core i7 with 8GB RAM. Trying to save that file as a M$ Excel 2010 .xlsx ended up in getting out of memory (full use of RAM AND 5GB SWAP PARTITION!) and -after 30 minutes- a crashed LibreOffice Calc (v5.1.2.x). Btw: xubuntu with loaded LibreOffice Calc with an empty sheet needs aprox. 250MB of RAM and no use of the swap partition. As Markus Mohrhard mentioned above, I expected a normal RAM consumption of (temporary) 2x document size + needed additional spreadsheet data. In my case something about 500MB, NOT 12.5GB! So, this bug still exists.
Created attachment 128996 [details] Random value CSV file ~237MB (unpacked) I have tested attached CSV export to .ODS and .XLSX. In both format there are needed about 25GB of memory to save to desired formats. The difference is that during exporting to .ODS, the hard disc is used (you could noticed that free disc space is decreasing). The RAM usage for LibreOffice during exporting to .ods is constant (about 1200MB) Generally exporting into .ods is successful. Unforunately during exporting into .xlsx, only RAM is used. There is no disc usage, during exporting into .xlsx It means that during exporting to .xlsx, the amount of free RAM memory is dramatically decreasing until it crash. In my opinion desired solution for exporting to .xlsx, is saving data on disc, as it is already done for .ods.
I tried to export the attached csv file to .ods with LO 5.4.1.0.0+ built at home under Ubuntu 16.04 x86-64 but it failed with error I/O message. No problem to load the csv file, it was long but it worked. The PC has 8 Go of RAM and a SSD 512 Go. System monitor showed that process soffice.bin used 1,1 Go of RAM. Best regards. JBF
I'm seeing a similar bug. Read a 233MB CSV file into Calc without problems. Attempted to write it out as CSV with slightly different options ("quote everything"), and it's half done after an hour. About 2/3 done after an hour and a half. Progress continues. Calc is in the "greyed out" state, not processing mouse clicks. Since this is on Linux, it doesn't affect other processes much, and I'm letting this run to see if it finishes. It's not hung; the progress bar advances slowly. CPU utilization 100% of 1 CPU. Very little disk I/O. Memory not full. Not thrashing. Very little I/O, maybe one disk write every 10 seconds. This looks like there's something in writing out a CSV file that's worse than O(1), so it chokes on large files. Somebody with a debug build should profile this while writing out a large CSV file and see where it's spending all the CPU time. Ubuntu 16.04LTS x64 8GB. Calc version: libreoffice-calc 1:5.1.6-rc2.0ubuntu1-xenial2
Update: after over 2 hours, the writing of the CSV file finished. The output file is valid. Calc is back to its normal operating state. Strongly suggest profiling and looking for something that's O(N^2) in file exporting.
File load time for 233MB file is only 30 seconds. Scrolling around in the file works fine. It's only "Save" that's slow. Save as .ods is also extremely slow. It's not just .csv. So it's not the CSV exporter. About 5 minutes into that now. As before, 100% of 1 CPU in use, very little I/O, no thrashing, not out of memory.
Update: saving as .ods took about 1.5 hours, but finished. Opening file thus saved took about 5 minutes to open.
(In reply to Bartosz from comment #12) > Created attachment 128996 [details] > Random value CSV file ~237MB (unpacked) Still crashing upon saving to xlsx. I have 32 GB memory, it crashed around 8GB used. Loading is no problem. Arch Linux 64-bit Version: 6.4.0.0.alpha0+ Build ID: c2cb467a1e5194c56bb65706b7965fb2c9241b8f CPU threads: 8; OS: Linux 5.1; UI render: default; VCL: gtk3; Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US Calc: threaded Built on 29 June 2019
It takes me no more than 20 seconds to save the csv file generated using Owen Genat's bash script (550MB) to ODS, while the RAM usage is 2.6GB maximum in the whole process (I am using a SSD drive). Could someone retest? It may have been fixed somewhere. Fedora 32 x64 with 8GB RAM, LibreOffice 7.1 branch. I set this to NEEDINFO.
Indeed, similar result as Kevin when saving to XLSX Arch Linux 64-bit Version: 7.0.4.2 Build ID: 00(Build:2) CPU threads: 8; OS: Linux 5.9; UI render: default; VCL: kf5 Locale: fi-FI (fi_FI.UTF-8); UI: en-US 7.0.4-2 Calc: threaded