Bug 124982 - Thread the Calc CSV parser
Summary: Thread the Calc CSV parser
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.2.3.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks: CSV-Import CPU-AT-100%
  Show dependency treegraph
 
Reported: 2019-04-26 12:12 UTC by Owen Savill
Modified: 2021-01-06 07:09 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Owen Savill 2019-04-26 12:12:51 UTC
On opening a 16M CSV file in calc I wondered why it was taking so long and why the PC's fans started blowing like crazy.

On reviewing the output from top I could see that just one of the eight available cores was running at 100%, presumably why the fans kicked in, with LO. All the other cores were basically idle.

Is there some way of making this more efficient? Is LO single threaded?
Comment 1 m.a.riosv 2019-04-26 20:38:00 UTC
Do you have enable Menu/Tools/Options/LibreOffice calc/calculate - CPU threading ssettings.
Comment 2 Owen Savill 2019-04-27 09:22:18 UTC
Yes, it's on. Is calc the only part that offers multi CPU support?
Comment 3 Roman Kuznetsov 2019-05-09 15:52:25 UTC
(In reply to Owen Savill from comment #2)
> Yes, it's on. Is calc the only part that offers multi CPU support?

Calc uses multithreading only for calulating of many the same formulas. For parsing of CSV it doesn't work. In general LO is single threaded
Comment 4 Xisco Faulí 2019-05-14 09:35:13 UTC
Hi Michael,
Do we use the fastparser for parsing CSV files ?
Comment 5 Michael Meeks 2019-05-14 16:37:42 UTC
Calc has a single threaded CSV parser - that is correct. Given the general simplicity of the CSV file format, it seems likely that profiling that with kcachegrind and optimizing it further would probably yield more of a win than trying to thread it - but of course, its possible that a parse/insert separation would be helpful.

It is also possible that some column iterators for data insertion might help optimize the mdds side.

Can you paste a few rows of your CSV file ? how homogeneous is it ? are there large gaps in lots of the columns - or is it uniform ?

Anyhow - an enhancement here - someone might like to hack on that:

sc/source/filter/orcus/orcusfiltersimpl.cxx
bool ScOrcusFiltersImpl::importCSV(ScDocument& rDoc, SfxMedium& rMedium) const

Looks like a good place to poke. I believe much of the CSV parsing is inside the standalone orcus library which should be easy to hack on - but whether that's where the slowness is its hard to say without profiling =)

HTH.