Bug 150712 - Format->Conditional->Condition 100% CPU
Summary: Format->Conditional->Condition 100% CPU
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.5.0.0 alpha0+
Hardware: All Linux (All)
: medium normal
Assignee: Noel Grandin
URL:
Whiteboard: target:7.5.0 target:7.4.2
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-31 02:44 UTC by Pierre Fortin
Modified: 2022-09-13 21:09 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
AA.tsv (zip'ed) (5.20 MB, application/zip)
2022-08-31 13:08 UTC, Pierre Fortin
Details
Flamegraph (296.51 KB, image/svg+xml)
2022-08-31 16:59 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre Fortin 2022-08-31 02:44:13 UTC
Load a smallish 230+MB .tsv file (154161x501); I will be creating Jumbo files; but need this to work before wasting time on the bigger files.. Except for Row 1 (6-digit YYMMDD date headers) and column A (ints), all cells are 2-character codes for which I want to apply background colors based on the contents.  The conditions work fine on a test sheet of about 25x4 cells. 

Format->Conditional->Condition gives the dialog window as expected; but... if select the entire sheet first, I get the dislog window with the contents from the sheet, and calc goes 100% CPU. I've let it run for a long time and it never returns; has to be killed. 

When the full sheet of the 154161x501 range is selected, I never see the normal contents of the "Conditional Formatting for A1:xxxx" window.

Also happens in version 7.4.0.3 which is hung as I write this report.
Comment 1 Pierre Fortin 2022-08-31 03:34:16 UTC
Some testing shows that it takes a while to get the dialog window to present the formatting panel:
  5000x500 cells selected ==  6 seconds
 10000x500 cells selected -- 17 seconds
 25000x500 cells selected -- 8 minutes 45 seconds
No need for further tests as the processing time is clearly non-linear.

Sigh...  this kills my plans of using Calc. My largest sheets of this type are around  950,000x500.
Comment 2 Pierre Fortin 2022-08-31 07:09:26 UTC
I hate giving up...   Here's a workaround in case that helps someone...

Discovered that while a tad slow, though only a few seconds; I could select one column and apply 8 conditions to that column's 151K+ rows. Then, I edited the  range from $B$2:$B$154161 to $B2:$SG$154161 which applied quickly across the entire range.

I'll try this same workaround tomorrow on the 99 other sheets which are being created overnight (some expected to be 950Kx501)...  

Actually, the workaround should be even faster by applying the initial formatting to just a few cells, then edit the range...  That said, it becomes obvious Calc loads the file, then applies the formatting before displaying the sheet -- that is quite slow too.
Comment 3 Julien Nabet 2022-08-31 09:20:49 UTC
Would it be possible you attach a file with minimum size just to reproduce the pb? (by using this link https://bugs.documentfoundation.org/attachment.cgi?bugid=150712&action=enter)
The goal is to retrieve a perf/Flamegraph so devs may pinpoint the bottleneck.
Comment 4 Pierre Fortin 2022-08-31 13:08:07 UTC
Created attachment 182114 [details]
AA.tsv (zip'ed)

Sure. The full 100 files got created overnight; they range in size:
       5827354 Aug 31 02:56 EE/EE.tsv     (3870 rows)
    1793058309 Aug 31 02:41 CW/CW.tsv  (1189571 rows)
The one I started with is:
     232277885 Aug 31 01:58 AA/AA.tsv  (154161 rows)
All files have 501 columns.

I'm also including the final .ods file so you can see the result I need to produce on these 100 files; this was created using the previously mentioned workaround.

The .ods file:
- Loading is reasonable speed.
- Adapt row height is slow compared to load.
- The file takes right at 3 minutes to load.

Generating a .pdf is also very slow; as I write this, it's taking about 7 minutes for 10% on the progress bar. Generating the .pdf as a single page if that works...
Comment 5 Pierre Fortin 2022-08-31 14:29:42 UTC
I was hoping to create a reasonable .pdf for my team to avoid the long load time. It took about 70 minutes to created the pdf of 132,174 pages with 48 rows by 12 columns. The most important part of this data is to be able to view the rows left to right; scrolling left to right. Scrolling top to bottom is barely useful. So it appears my only way to distribute these results is the .ods file itself.

Calc may the wrong tool for these reports; but that's my challenge...  There are more codes than the current colors depict; and it's already challenging for the other people to recall what the code means.  It's not easy summarizing terabytes of data...  Thanks for the 7.5 builds!
Comment 6 Julien Nabet 2022-08-31 16:59:07 UTC
Created attachment 182121 [details]
Flamegraph

Here's a Flamegraph retrieved on pc Debian x86-64 with master sources updated today.

Here what I did:
- load the tsv file
- select the whole sheet
=> start perf recording
- Menu Format/Conditional/Condition
=> Wait about 30 secs (still not finished)
=> stop recording
Comment 7 Julien Nabet 2022-08-31 17:05:59 UTC
Pierre: I understand that there are slowdowns at different moment but let's focus on the initial description, ie Menu Format/Conditional/Condition after having selected a whole sheet.

Noel: any idea if this part may be optimized?
Comment 8 Julien Nabet 2022-08-31 17:21:20 UTC
I gave a try with this patch:
diff --git a/svx/source/dialog/fntctrl.cxx b/svx/source/dialog/fntctrl.cxx
index 58b412f1e822..04af6deae0b6 100644
--- a/svx/source/dialog/fntctrl.cxx
+++ b/svx/source/dialog/fntctrl.cxx
@@ -124,15 +124,9 @@ void setFont(const SvxFont& rNewFont, SvxFont& rImplFont)
  */
 bool CleanAndCheckEmpty(OUString& rText)
 {
-    bool bEmpty = true;
-    for (sal_Int32 i = 0; i < rText.getLength(); ++i)
-    {
-        if (0xa == rText[i] || 0xd == rText[i])
-            rText = rText.replaceAt(i, 1, u" ");
-        else
-            bEmpty = false;
-    }
-    return bEmpty;
+    rText = rText.replaceAll("\xa", u" ");
+    rText = rText.replaceAll("\xd", u" ");
+    return rText.isEmpty();
 }
 } // end anonymous namespace


and I didn't wait for 30 secs to have the results
Comment 9 Pierre Fortin 2022-09-01 02:26:16 UTC
To confirm, this workaround is super fast (to help anyone needing a fast solution) to conditionally format an large|jumbo sheet:
- using a previously formatted [sample] sheet:
  Copy a cell which has desired conditional formatting
- On sheet needing format: 
  - locate bottom-right of HUGE range (note cell)
  - return to top-left of range
  - select a few top-left cells to be formatted
  - Edit->Paste->Paste Special...  Click Formats Only
  - Format->Conditional->Manage...
  - Selected range is highlighted
  - click Edit...
  - Range: (at bottom) Change to cover the huge range you need formatted
    (e;g.,  B2:SG320976 (yes, that's 160,488,000 cells)
  - Click OK
  - Click OK
Formatting is applied instantly.

A few more steps than the expected manner; but this is FAST w/o 100% CPU.
Comment 10 Noel Grandin 2022-09-01 07:13:52 UTC
(In reply to Julien Nabet from comment #8)
> I gave a try with this patch:

Patch looks good to me
Comment 11 Julien Nabet 2022-09-01 10:10:13 UTC
(In reply to Noel Grandin from comment #10)
> (In reply to Julien Nabet from comment #8)
> > I gave a try with this patch:
> 
> Patch looks good to me

Thank you for the feedback, I submitted this patch https://gerrit.libreoffice.org/c/core/+/139152 as you may have seen.
Comment 12 Julien Nabet 2022-09-01 10:11:42 UTC
off-topic: just wonder if, more generally, some other locations containing a loop which includes replaceAt could be optimized.
Comment 13 Commit Notification 2022-09-01 11:43:13 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/076be52655f2cf4e331b344e0f6497f41b37fa8c

tdf#150712: Format->Conditional->Condition 100% CPU

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2022-09-01 12:18:28 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e4d9d522fec5770dd25f99384ae7a012636662cb

Revert "tdf#150712: Format->Conditional->Condition 100% CPU"

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Julien Nabet 2022-09-01 12:19:04 UTC
I reverted the previous patch since I had changed the logic
Comment 16 Julien Nabet 2022-09-01 14:07:41 UTC
The critical function was created in a fix I had made for tdf#113657 (added in See also). I'll take a look if instead of space we can replace by empty string.
Comment 17 Julien Nabet 2022-09-01 14:15:13 UTC
Reading git history, the space was already there, in fact it's there from: c6726f343e8a944fc9f1031ca39520c1527b05e6
INTEGRATION: CWS sw016 (1.14.106); FILE MERGED
in 2003
Comment 18 Commit Notification 2022-09-01 17:52:24 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/bb7747f627fe7e909fca7264fd42fe6cb7aef0b5

tdf#150712 use more string_view

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2022-09-02 07:35:49 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d8326fc81f7225047178e1b8a7605d32109fa08c

tdf#150712: Format->Conditional->Condition 100% CPU (2nd try)

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2022-09-02 08:58:29 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b24eceb5178fda17fcc4b20c5286e8d27c5ae474

tdf#150712 limit the dialog to only a sample of the data

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2022-09-02 09:24:03 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/ab436b8cd80e100314832ffc2ef54a91d843859f

tdf#150712: Format->Conditional->Condition 100% CPU (2nd try)

It will be available in 7.4.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2022-09-02 09:57:27 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b392d0a25e229f305497ab5f9f084b71b24cc387

tdf#150712 reduce cost of SharedString::getString

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Julien Nabet 2022-09-02 10:35:45 UTC
On pc Debian x86-64 with master sources updated today, the dialog appears right away.
It's thanks to Noel's patch so unassign myself and put Noel instead.

Thank you Noel!

I cherry-picked for review on 7.4 branch these:
https://gerrit.libreoffice.org/c/core/+/139186
https://gerrit.libreoffice.org/c/core/+/139210

Let's put this one to FIXED now.
Comment 24 Commit Notification 2022-09-13 21:09:01 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/4cef3204e011b235b807d26ddb3c901690b633ff

tdf#150712 limit the dialog to only a sample of the data

It will be available in 7.4.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.