Bug 67538 - XTypeDetection::queryTypeByDescriptor poor performance
Summary: XTypeDetection::queryTypeByDescriptor poor performance
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: framework (show other bugs)
Version:
(earliest affected)
4.0.3.3 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.4.0 target:6.3.0.1
Keywords: perf
Depends on:
Blocks:
 
Reported: 2013-07-30 14:10 UTC by Grigory
Modified: 2019-06-04 07:36 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
C++ source reproducing the problem (12.74 KB, text/plain)
2013-07-30 14:10 UTC, Grigory
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Grigory 2013-07-30 14:10:04 UTC
Created attachment 83303 [details]
C++ source reproducing the problem

I am using Debian and upgraded to libreoffice from wheezy-backports (version 1:4.0.3-2~bpo70+1 in Debian notation).

I can succefully connect to Libreoffice using API but when I try to get the type of any document from InputStream using XTypeDetection::queryTypeByDescriptor it looks like hang. For very short strings (about 1000 bytes) the function returns the type (generic_Text or encoded in my tests) correctly and quite fast but if I load even 50000 bytes file the wait time to get the type is more than a minute. I can see one thread from my app consuming 50% of cpu and one thread from Libreoffice process consuming 50% of cpu for all this time.

I tried to attach gdb to the process but the only thing I understand that the execution is inside cppu_threadpool::JobQueue::enter and doesn't go out until the end.

I used openoffice from Debian squeeze before and my code worked fine so I suppose that this situation is a bug.

I attach simple test application to reproduce the problem
Comment 1 Stephan Bergmann 2013-07-31 09:32:00 UTC
LibreOffice's document type detection code is notoriously slow, trying lots of filters one by one until it finds a good match, and seeking and reading the same data over and over again from the given input stream.  So, if that stream is only made available across URP, this can easily cause lots of delay, esp. if the stream's data is in a "bogus" format for which no matching filter can be found (so the search needs to go through all of them).

That said, if you say things were considerably faster with an old OpenOffice.org version (which exactly was that?), it might be a performance regression in the rewritten binary URP bridge hinted at <https://issues.apache.org/ooo/show_bug.cgi?id=116038#c14> "rewrite binary URP bridge."
Comment 2 Grigory 2013-07-31 10:35:54 UTC
I have some additional research: the main problem is reading data from the inputstream byte-by-byte. If I pass the file by URL the perfomance is great. The call to InputStream::readBytes is perforemed with size parameter num=1 and then this byte is transmitted across processes - this is very slow. I added debug print to my stream implemenation and the result is :
Successfully connected to LibreOffice
Changed location to: 0
Readed: 30
Changed location to: 0
Changed location to: 0
Changed location to: 0
Readed: 1024
Changed location to: 0
Changed location to: 0
Changed location to: 0
Changed location to: 0
Changed location to: 0
Readed: 4096
Changed location to: 0
Changed location to: 0
Changed location to: 0
Changed location to: 0
Changed location to: 0
Readed: 4096
Changed location to: 0
Changed location to: 0
Readed: 26
Changed location to: 0
Changed location to: 0
Readed: 7
Changed location to: 0
Changed location to: 0
Changed location to: 0
Readed: 512
Changed location to: 0
Changed location to: 0
Readed: 1
Changed location to: 0
Changed location to: 0
Readed: 4
Changed location to: 0
Changed location to: 0
Changed location to: 0
Changed location to: 0
Readed: 4096
Changed location to: 0
Changed location to: 1
Readed: 1
Readed: 1
Readed: 1
Changed location to: 0
Readed: 1
Changed location to: 0
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
Readed: 1
...... 
byte-by-byte read

So it seems that there is no buffering in communication. This problem makes InputStreams useless - it is not just a two times slower as in <https://issues.apache.org/ooo/show_bug.cgi?id=116038#c14> "rewrite binary URP bridge.
Comment 3 Robinson Tryon (qubit) 2015-01-15 19:41:24 UTC
(In reply to Grigory from comment #2)
> I have some additional research: the main problem is reading data from the
> inputstream byte-by-byte. If I pass the file by URL the perfomance is great.

Hi Grigory,
How's the performance of our the latest builds?

Status -> NEEDINFO
Comment 4 Grigory 2015-01-19 19:44:49 UTC
(In reply to Robinson Tryon (qubit) from comment #3)
> (In reply to Grigory from comment #2)
> > I have some additional research: the main problem is reading data from the
> > inputstream byte-by-byte. If I pass the file by URL the perfomance is great.
> 
> Hi Grigory,
> How's the performance of our the latest builds?
> 
> Status -> NEEDINFO

Sorry, I don't have a computer with necessary environment to check the error. But the code that I attached to the ticket should reproduce the problem if it still exists
Comment 5 Robinson Tryon (qubit) 2015-01-19 22:03:27 UTC
(In reply to Grigory from comment #4)
> Sorry, I don't have a computer with necessary environment to check the
> error. But the code that I attached to the ticket should reproduce the
> problem if it still exists

Stephan: Is testing the code straightforward, or is this something a dev will need to handle?
Comment 6 Robinson Tryon (qubit) 2015-03-19 15:26:24 UTC
(In reply to Robinson Tryon (qubit) from comment #5)
> (In reply to Grigory from comment #4)
> > Sorry, I don't have a computer with necessary environment to check the
> > error. But the code that I attached to the ticket should reproduce the
> > problem if it still exists
> 
> Stephan: Is testing the code straightforward, or is this something a dev
> will need to handle?

Stephan: Nudge ;-)
Comment 7 Matthew Francis 2015-04-26 10:50:00 UTC
Confirmed using the below macro (adapted from AndrewMacro.pdf) - where /tmp/aaaa is 500,000 x "a", this takes a couple of seconds on a fast machine, which seems excessive.

Sub DetectDocType()
  Dim oMediaDescr(30) As new com.sun.star.beans.PropertyValue
  Dim s$ : s$ = "com.sun.star.document.TypeDetection"
  Dim oTypeManager

  oMediaDescr(0).Name = "URL"
  oMediaDescr(0).Value = "file:///tmp/aaaa"

  oTypeManager = createUnoService(s$)
  MsgBox oTypeManager.queryTypeByDescriptor(oMediaDescr(), True) 
End Sub


Setting -> NEW
Comment 8 Robinson Tryon (qubit) 2015-12-10 10:13:27 UTC Comment hidden (obsolete)
Comment 9 Xisco Faulí 2016-09-19 15:29:51 UTC Comment hidden (obsolete)
Comment 10 Xisco Faulí 2017-09-29 08:51:44 UTC Comment hidden (obsolete)
Comment 11 Commit Notification 2019-06-03 10:16:00 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/1439a09a13516f72baa735e5af332b0647d0cff7%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part1

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2019-06-03 10:49:59 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/c0d372d7c0d9284aad8b0d5142dff7c34c062fa9%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part2

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2019-06-03 13:24:50 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/0d04315c17a6df9f971237d45d9e5e8af765dd17%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part4

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2019-06-03 13:24:59 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/7bf2515fc48ed0d4c436aef298fa9c35e573352b%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part3

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Noel Grandin 2019-06-03 13:26:28 UTC
I could not get the C++ sample program to work for me, but the performance of querying types is substantially improved.

Feel free to re-open this bug if the sample program still indicates a problem
Comment 16 Commit Notification 2019-06-04 07:36:16 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-6-3":

https://git.libreoffice.org/core/+/6fdd1dc34f497fce28f85807126e56432a3cb7d2%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part1

It will be available in 6.3.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2019-06-04 07:36:26 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-6-3":

https://git.libreoffice.org/core/+/77219c88ac0a44b3ed5dada67d0d9ca52fa3adec%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part2

It will be available in 6.3.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2019-06-04 07:36:34 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-6-3":

https://git.libreoffice.org/core/+/6e4513e2b4e32926354d745ecf08fdf440392a75%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part3

It will be available in 6.3.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2019-06-04 07:36:43 UTC
Noel Grandin committed a patch related to this issue.
It has been pushed to "libreoffice-6-3":

https://git.libreoffice.org/core/+/49a17e9ba500451929a2c4cb63c30e50e989886c%5E%21

tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part4

It will be available in 6.3.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.