Bug 155232

Summary: LibreOffice and LanguageTool extension: LibreOffice doesn't free RAM for special interface XFlatParagraph
Product: LibreOffice Reporter: Marco A.G.Pinto <marcoagpinto>
Component: WriterAssignee: Mike Kaganski <mikekaganski>
Status: RESOLVED FIXED    
Severity: normal    
Priority: medium    
Version: 7.5.3.2 release   
Hardware: All   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=155647
Whiteboard: target:7.6.0 target:7.5.4
Crash report or crash signature: Regression By:
Attachments: StarterProject.oxt
Source code of the extension from comment 1
A reproducer for ever-growing memory consumption

Description Marco A.G.Pinto 2023-05-10 12:13:01 UTC
Hello!

Here is the comment of the developer of the LanguageTool extension for LibreOffice:
https://github.com/languagetool-org/languagetool/issues/8012#issuecomment-1540640790

Can some sort of fix be implemented?

Thanks!
Comment 1 Mike Kaganski 2023-05-11 16:13:13 UTC
Created attachment 187207 [details]
StarterProject.oxt

Here is a comment by a LanguageTool developer:

===

I built a dummy proofreader as LO extension. It does no real proof and no marking. It gets all XFlatParagraphs using the method "getNextPara" to get an initial paragraph and after that "getParaBefore" and "getParaAfter" from XFlatParagraphIterator and stores it in an ArrayList. The whole procedure is running in a loop. After the whole document is stored as XFlatparagraphs, the list is emptied and the XflatParagraphs are called and stored again. The loop runs 10000 times per paragraph.
You should disable or remove all grammar checkers from your LO installation and install the OXT after that. After restart of LO, load a document containing some hundred paragraphs.
In my tests the java heap space doesn't exceed 800 MB while the used memory of LO grows steady.
Here the file
Comment 2 Mike Kaganski 2023-05-12 09:00:08 UTC
Created attachment 187220 [details]
Source code of the extension from comment 1
Comment 3 Mike Kaganski 2023-05-12 09:06:18 UTC
Created attachment 187221 [details]
A reproducer for ever-growing memory consumption

Repro with the attached document.

It has several hundred lorem ipsum paragraphs, and a macro to produce the problem:

===

sub oneLoop(iter)
    start = iter.getNextPara()
    Dim paragraphs(0 to 25350) ' Just the number of paragraphs in this document
    dim para as object
    para = start
    n = 0
    do while not IsNull(para)
      paragraphs(n) = para
      n = n + 1
      para = iter.getParaBefore(para)
    loop
    para = iter.getParaAfter(start)
    do while not IsNull(para)
      paragraphs(n) = para
      n = n + 1
      para = iter.getParaAfter(para)
    loop
end sub

sub testOOM
  doc = thisComponent
  iter = doc.getFlatParagraphIterator(com.sun.star.text.TextMarkupType.PROOFREADING, true)
  for i = 0 to 1000000
    oneLoop(iter)
  next i
end sub

===

Running 'testOOM' would result in slowly, but steadily growing memory consumption, and will OOM eventually.

The problem is the m_aFlatParaList member in the implementation of XFlatParagraphIterator [1]. It was introduced in the initial commits that introduced the API: [2] [3]. It is not used elsewhere, and its use to keep references to objects managed by UNO refcounting mechanism is questionable.

The easyhack is to just drop the list.

[1] https://git.libreoffice.org/core/+/master/sw/source/core/inc/unoflatpara.hxx#127
[2] https://git.libreoffice.org/core/+/677eba2322d2753951024c688d59553182bf2fbd%5E%21/
[3] https://git.libreoffice.org/core/+/ba76230f6f677774b0d333da946a7e487acbeb0b%5E%21/
Comment 4 Mike Kaganski 2023-05-12 20:12:08 UTC
I see it's a bit pressing for the participants; let's not wait an easyhacker.

https://gerrit.libreoffice.org/c/core/+/151712
Comment 5 Commit Notification 2023-05-13 05:23:50 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a7ce722b476c4bb0c9a113ae0c2759181edfe48f

tdf#155232: drop m_aFlatParaList from SwXFlatParagraphIterator

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2023-05-15 09:02:56 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/193c0f20fc1f8f836ebdabac0d8a1065162653a7

tdf#155232: drop m_aFlatParaList from SwXFlatParagraphIterator

It will be available in 7.5.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.