Bug 124982 - Thread the Calc CSV parser
Summary: Thread the Calc CSV parser
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.2.3.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks: CSV-Import CPU-AT-100%
  Show dependency treegraph
 
Reported: 2019-04-26 12:12 UTC by Owen Savill
Modified: 2021-12-16 08:52 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Owen Savill 2019-04-26 12:12:51 UTC
On opening a 16M CSV file in calc I wondered why it was taking so long and why the PC's fans started blowing like crazy.

On reviewing the output from top I could see that just one of the eight available cores was running at 100%, presumably why the fans kicked in, with LO. All the other cores were basically idle.

Is there some way of making this more efficient? Is LO single threaded?
Comment 1 m_a_riosv 2019-04-26 20:38:00 UTC
Do you have enable Menu/Tools/Options/LibreOffice calc/calculate - CPU threading ssettings.
Comment 2 Owen Savill 2019-04-27 09:22:18 UTC
Yes, it's on. Is calc the only part that offers multi CPU support?
Comment 3 Roman Kuznetsov 2019-05-09 15:52:25 UTC
(In reply to Owen Savill from comment #2)
> Yes, it's on. Is calc the only part that offers multi CPU support?

Calc uses multithreading only for calulating of many the same formulas. For parsing of CSV it doesn't work. In general LO is single threaded
Comment 4 Xisco Faulí 2019-05-14 09:35:13 UTC
Hi Michael,
Do we use the fastparser for parsing CSV files ?
Comment 5 Michael Meeks 2019-05-14 16:37:42 UTC
Calc has a single threaded CSV parser - that is correct. Given the general simplicity of the CSV file format, it seems likely that profiling that with kcachegrind and optimizing it further would probably yield more of a win than trying to thread it - but of course, its possible that a parse/insert separation would be helpful.

It is also possible that some column iterators for data insertion might help optimize the mdds side.

Can you paste a few rows of your CSV file ? how homogeneous is it ? are there large gaps in lots of the columns - or is it uniform ?

Anyhow - an enhancement here - someone might like to hack on that:

sc/source/filter/orcus/orcusfiltersimpl.cxx
bool ScOrcusFiltersImpl::importCSV(ScDocument& rDoc, SfxMedium& rMedium) const

Looks like a good place to poke. I believe much of the CSV parsing is inside the standalone orcus library which should be easy to hack on - but whether that's where the slowness is its hard to say without profiling =)

HTH.
Comment 6 Kevin Suo 2021-11-02 10:11:16 UTC
Owen Savill: Is this still an issue? If yes, would you please provide your comments to Michael Meeks's requests in comment 5?
Comment 7 Michael Meeks 2021-11-02 11:56:05 UTC
Hi Noel, I thought this might interest you =) importing huge CSVs is probably something you've looked at its a big spreadsheet use-case; did you do any work on that since 2019 ? =)
Comment 8 Kohei Yoshida 2021-12-16 04:00:04 UTC
Calc uses its own CSV import filter, not the one from orcus.  Removing it from the "orcus bugs" meta issue.
Comment 9 Owen Savill 2021-12-16 08:45:52 UTC
> Is this still an issue? If yes, would you please provide your comments to Michael Meeks's

Hi, Sorry about not responding earlier. I've just tried a 37Mb CSV file and it opens very quickly. Did you do something to the code?
Comment 10 Kevin Suo 2021-12-16 08:52:41 UTC
> I've just tried a 37Mb CSV file and it opens very quickly

Good, then let's mark this as RESOLVED WORKSFORME since we do not know which commit had fixed this.