Bug 126710 - fileopen: Calc can not open xlsx, consuming tens gigs of RAM (memory leak)
Summary: fileopen: Calc can not open xlsx, consuming tens gigs of RAM (memory leak)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.3 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, haveBacktrace, perf, regression
Depends on:
Blocks: Memory
  Show dependency treegraph
 
Reported: 2019-08-05 14:09 UTC by Michal
Modified: 2024-03-30 20:21 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:


Attachments
file which causes problem for me (4.52 MB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2019-08-05 14:10 UTC, Michal
Details
perf flamegraph (921.74 KB, image/svg+xml)
2019-08-05 20:04 UTC, Julien Nabet
Details
tail of terminal output from bibisect in 44max repo (3.77 KB, text/plain)
2019-08-18 17:30 UTC, Terrence Enger
Details
bt from one of many allocations of size 0x98 (18.58 KB, text/plain)
2019-08-30 17:33 UTC, Terrence Enger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michal 2019-08-05 14:09:46 UTC
Description:
When trying to open the xlsx file, LO hangs, consuming tens gigs of memory.

Windows  7 Prof, 16 GB RAM, LO 5x, LO 6.2.5.2.

Steps to Reproduce:
1. try to open attached file (I hope I will be aple to attach the file).


Actual Results:
Calc hangs and starts to consume tens gigs of RAM.

Expected Results:
Open the xlsx file (it is a very small file).


Reproducible: Always


User Profile Reset: No



Additional Info:
Verze: 6.2.5.2 (x64)
ID sestavení: 1ec314fa52f458adc18c4f025c545a4e8b22c159
Vlákna CPU: 4; OS: Windows 6.1; Vykreslování UI: GL; VCL: win; 
Národní prostředí: cs-CZ (cs_CZ); Jazyk UI: cs-CZ
Calc: threaded

I did not try to reset my profile, because the problem happens on any computer with Libre Office, so I believe it is not profile related problem.
And I really do not know earliest affected LO version.
Comment 1 Michal 2019-08-05 14:10:50 UTC
Created attachment 153143 [details]
file which causes problem for me
Comment 2 Xisco Faulí 2019-08-05 14:58:23 UTC
I can reproduce it in

Version: 6.4.0.0.alpha0+
Build ID: 620fff54ca9cd04459cc5d963ef94d4438129fe4
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

Version: 5.2.0.0.alpha1+
Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
CPU Threads: 4; OS Version: Linux 4.15; UI Render: default; 
Locale: ca-ES (ca_ES.UTF-8

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e

For the test I used a timeout=120.

in

Version: 4.1.0.0.alpha1+
Build ID: a2c9d4f8bbde97f175bae4df771273a61251f40

(oldest commit in bibisect-42max)

it takes

real	1m29,586s
user	1m20,011s
sys	0m1,822s

and in

Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)

( oldest commit in bibisect-41max)

it takes

real	0m24,596s
user	0m11,709s
sys	0m0,832s
Comment 3 Julien Nabet 2019-08-05 20:04:30 UTC
Created attachment 153146 [details]
perf flamegraph

On pc Debian x86-64 with master sources updated today, I retrieved a Flamegraph perf.
Comment 4 Terrence Enger 2019-08-18 17:30:31 UTC
Created attachment 153490 [details]
tail of terminal output from bibisect in 44max repo

Working on debian-buster in bibisect-44max repository I find that the
growth in memory usage started between:

          commit    s-h       date
          --------  --------  -------------------
    good  d6ae68ec  bfaf4401  2014-08-29 22:42:31
    bad   41de0160  3b856f02  2014-08-29 22:24:32

The commit message is (rewrapped)

    commit 3b856f028735d292c9b02168704d4a07e2f43cd5
    Author: Kohei Yoshida <kohei.yoshida@collabora.com>
    Date:   Fri Aug 29 17:18:09 2014 -0400

        Use the source dimension name when searching for a dimension.
    
        Otherwise we might miss the right dimension object.  This
        fixes the bug where the subtotal function of the second data
        field was not set correctly when importing from xlsx.
    
        Change-Id: Id6ecb07b86cf6803a3f6f7604267ce2f5f9a4067

I am removing keyword bibisectRequest, adding keyword bisected, and
adding Kohei to cc.

The increase in CPU usage happened somewhere before bibisect-44max
version oldest.
Comment 5 Terrence Enger 2019-08-30 17:33:24 UTC
Created attachment 153762 [details]
bt from one of many allocations of size 0x98

Over the course of about two hours I collected mtrace output from LO
opening calc-memory.xls before LO crashed.  The mtrace output is 13
GB, some 130 million lines.  Toward the end of the trace file, there
are millions of pairs of lines like (but with changing, apparently
increasing locations allocated) (lines rewrapped):

        @ /usr/lib/x86_64-linux-gnu/libstdc++.so.6:(_Znwm+0x18)
            [0x7fac2dc80fd8] + 0x557ef01fdb60 0x98
        @ /usr/lib/x86_64-linux-gnu/libstdc++.so.6:(_Znwm+0x18)
            [0x7fac2dc80fd8] + 0x557ef01fdc10 0x30

with only rare other heap operations.

The present attachment is a backtrace from one of the allocations of
size 0x98.  An allocation of size 0x30 is similar, the location in
ScDPResultDimension::AddMember having advanded from dptabres.cxx:3959
to line 3963 and there being 6 more call levels between there and
operator new.

LibreOffice in this case is a local build of source hash 7dcb5c65,
2019-08-19, configured:

    CC=ccache /usr/bin/gcc
    CXX=ccache /usr/bin/g++
    CCFLAGS=-Wshadow
    --with-jdk-home=/usr/lib/jvm/default-java
    --enable-split-debug
    --enable-gdb-index
    --enable-ld=gold
    --enable-option-checking=fatal
    #--enable-dbgutil
    --enable-debug
    --without-system-postgresql
    --without-myspell-dicts
    --with-extra-buildid
    --without-doxygen
    --with-external-tar=/home/terry/lo_hacking/git/src
    --without-package-format

built and running on debian-buster.  The installed RAM and swap space
allowed LO opening the same file in other tests to grow to 10 or 11 GB
before crashing.

I am removing keyword wantBacktrace and adding haveBacktrace.
Comment 6 Buovjaga 2021-05-03 09:45:13 UTC
Still repro, ends with general error dialog, though

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 9c930c4f3109d123c0831d0fcecf9c8b32e5bbc7
CPU threads: 2; OS: Windows 10.0 Build 19042; UI render: default; VCL: win
Locale: fi-FI (fi_FI); UI: en-US
Calc: threaded
Comment 7 Roman Kuznetsov 2022-05-11 12:57:58 UTC
still repro in

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 3a05acb8f0d94728ea6cbfd7a69dac6ffa7ffc68
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Vulkan; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded
Comment 8 Tex2002ans 2024-03-03 02:18:42 UTC
Yep, still reproduce in:

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac
CPU threads: 8; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded

- - -

Opening attachment 153143 [details] used up this much RAM:

LibreOffice 24.2.1:

- >25 GBs
   - (My computer has 32, so I cancelled loading soon after reaching that point.)

Microsoft Excel (2401, Build 17231.20236):

- ~148 MBs
Comment 9 Pavel Kysilka 2024-03-29 20:46:16 UTC
I am able to open the document. LibreOffice consume about 185 GB RAM.


The application log contains this interesting line:

warn:sc.core:115048:115048:sc/source/core/data/dptabsrc.cxx:2590: ScDPMember::GetItemData: what data? nDim 18, mnDataId 0


Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 454da7750db671c1f82fec4706de9a44c29b3e2e
CPU threads: 128; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded