Created attachment 83303 [details] C++ source reproducing the problem I am using Debian and upgraded to libreoffice from wheezy-backports (version 1:4.0.3-2~bpo70+1 in Debian notation). I can succefully connect to Libreoffice using API but when I try to get the type of any document from InputStream using XTypeDetection::queryTypeByDescriptor it looks like hang. For very short strings (about 1000 bytes) the function returns the type (generic_Text or encoded in my tests) correctly and quite fast but if I load even 50000 bytes file the wait time to get the type is more than a minute. I can see one thread from my app consuming 50% of cpu and one thread from Libreoffice process consuming 50% of cpu for all this time. I tried to attach gdb to the process but the only thing I understand that the execution is inside cppu_threadpool::JobQueue::enter and doesn't go out until the end. I used openoffice from Debian squeeze before and my code worked fine so I suppose that this situation is a bug. I attach simple test application to reproduce the problem
LibreOffice's document type detection code is notoriously slow, trying lots of filters one by one until it finds a good match, and seeking and reading the same data over and over again from the given input stream. So, if that stream is only made available across URP, this can easily cause lots of delay, esp. if the stream's data is in a "bogus" format for which no matching filter can be found (so the search needs to go through all of them). That said, if you say things were considerably faster with an old OpenOffice.org version (which exactly was that?), it might be a performance regression in the rewritten binary URP bridge hinted at <https://issues.apache.org/ooo/show_bug.cgi?id=116038#c14> "rewrite binary URP bridge."
I have some additional research: the main problem is reading data from the inputstream byte-by-byte. If I pass the file by URL the perfomance is great. The call to InputStream::readBytes is perforemed with size parameter num=1 and then this byte is transmitted across processes - this is very slow. I added debug print to my stream implemenation and the result is : Successfully connected to LibreOffice Changed location to: 0 Readed: 30 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 1024 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 4096 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 4096 Changed location to: 0 Changed location to: 0 Readed: 26 Changed location to: 0 Changed location to: 0 Readed: 7 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 512 Changed location to: 0 Changed location to: 0 Readed: 1 Changed location to: 0 Changed location to: 0 Readed: 4 Changed location to: 0 Changed location to: 0 Changed location to: 0 Changed location to: 0 Readed: 4096 Changed location to: 0 Changed location to: 1 Readed: 1 Readed: 1 Readed: 1 Changed location to: 0 Readed: 1 Changed location to: 0 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 Readed: 1 ...... byte-by-byte read So it seems that there is no buffering in communication. This problem makes InputStreams useless - it is not just a two times slower as in <https://issues.apache.org/ooo/show_bug.cgi?id=116038#c14> "rewrite binary URP bridge.
(In reply to Grigory from comment #2) > I have some additional research: the main problem is reading data from the > inputstream byte-by-byte. If I pass the file by URL the perfomance is great. Hi Grigory, How's the performance of our the latest builds? Status -> NEEDINFO
(In reply to Robinson Tryon (qubit) from comment #3) > (In reply to Grigory from comment #2) > > I have some additional research: the main problem is reading data from the > > inputstream byte-by-byte. If I pass the file by URL the perfomance is great. > > Hi Grigory, > How's the performance of our the latest builds? > > Status -> NEEDINFO Sorry, I don't have a computer with necessary environment to check the error. But the code that I attached to the ticket should reproduce the problem if it still exists
(In reply to Grigory from comment #4) > Sorry, I don't have a computer with necessary environment to check the > error. But the code that I attached to the ticket should reproduce the > problem if it still exists Stephan: Is testing the code straightforward, or is this something a dev will need to handle?
(In reply to Robinson Tryon (qubit) from comment #5) > (In reply to Grigory from comment #4) > > Sorry, I don't have a computer with necessary environment to check the > > error. But the code that I attached to the ticket should reproduce the > > problem if it still exists > > Stephan: Is testing the code straightforward, or is this something a dev > will need to handle? Stephan: Nudge ;-)
Confirmed using the below macro (adapted from AndrewMacro.pdf) - where /tmp/aaaa is 500,000 x "a", this takes a couple of seconds on a fast machine, which seems excessive. Sub DetectDocType() Dim oMediaDescr(30) As new com.sun.star.beans.PropertyValue Dim s$ : s$ = "com.sun.star.document.TypeDetection" Dim oTypeManager oMediaDescr(0).Name = "URL" oMediaDescr(0).Value = "file:///tmp/aaaa" oTypeManager = createUnoService(s$) MsgBox oTypeManager.queryTypeByDescriptor(oMediaDescr(), True) End Sub Setting -> NEW
Migrating Whiteboard tags to Keywords: (needAdvice)
'needsConfirmationAdvise' is only used for unconfirmed bugs. Removing it from this bug. [NinjaEdit]
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.4.1 or 5.3.6 https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug-20170929
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/1439a09a13516f72baa735e5af332b0647d0cff7%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part1 It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/c0d372d7c0d9284aad8b0d5142dff7c34c062fa9%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part2 It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/0d04315c17a6df9f971237d45d9e5e8af765dd17%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part4 It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/7bf2515fc48ed0d4c436aef298fa9c35e573352b%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part3 It will be available in 6.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I could not get the C++ sample program to work for me, but the performance of querying types is substantially improved. Feel free to re-open this bug if the sample program still indicates a problem
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-6-3": https://git.libreoffice.org/core/+/6fdd1dc34f497fce28f85807126e56432a3cb7d2%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part1 It will be available in 6.3.0.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-6-3": https://git.libreoffice.org/core/+/77219c88ac0a44b3ed5dada67d0d9ca52fa3adec%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part2 It will be available in 6.3.0.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-6-3": https://git.libreoffice.org/core/+/6e4513e2b4e32926354d745ecf08fdf440392a75%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part3 It will be available in 6.3.0.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Noel Grandin committed a patch related to this issue. It has been pushed to "libreoffice-6-3": https://git.libreoffice.org/core/+/49a17e9ba500451929a2c4cb63c30e50e989886c%5E%21 tdf#67538 XTypeDetection::queryTypeByDescriptor poor performance, part4 It will be available in 6.3.0.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.