Bug 54376

Summary: reads files one byte at a time
Product: LibreOffice Reporter: Lionel Elie Mamane <lionel>
Component: LibreOfficeAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED INVALID    
Severity: normal CC: bugs, jbfaure, markus.mohrhard, michael.meeks, sberg.fun, timar74
Priority: medium    
Version: 3.6.1.1 rc   
Hardware: Other   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:

Description Lionel Elie Mamane 2012-09-02 07:08:32 UTC
I was trying to see why LibreOffice was taking a long time starting. An strace shows an awful lot of read() system calls for just one byte. That makes reading the file *much* slower: use buffered I/O. Just use libc's fopen/fread instead of the OS's syscalls open/read, and even no need to change program logic if it wants to treat the characters one by one. (If it calls fread on several characters at once, it will be even faster...)

Here's one specific example I caught. That's reading the file 
swext/mediawiki/help/wikiformats.xhp

Someone knows which function/file/part of LibreOffice reads that file?


read(30, "n", 1)                        = 1
read(30, " ", 1)                        = 1
read(30, "W", 1)                        = 1
read(30, "i", 1)                        = 1
read(30, "k", 1)                        = 1
read(30, "i", 1)                        = 1
read(30, " ", 1)                        = 1
read(30, "a", 1)                        = 1
read(30, "r", 1)                        = 1
read(30, "t", 1)                        = 1
read(30, "i", 1)                        = 1
read(30, "c", 1)                        = 1
read(30, "l", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "s", 1)                        = 1
read(30, ".", 1)                        = 1
read(30, " ", 1)                        = 1
read(30, "N", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "v", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "r", 1)                        = 1
read(30, "t", 1)                        = 1
read(30, "h", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "l", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "s", 1)                        = 1
read(30, "s", 1)                        = 1
read(30, ",", 1)                        = 1
read(30, " ", 1)                        = 1
read(30, "t", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "x", 1)                        = 1
read(30, "t", 1)                        = 1
read(30, " ", 1)                        = 1
read(30, "a", 1)                        = 1
read(30, "l", 1)                        = 1
read(30, "i", 1)                        = 1
read(30, "g", 1)                        = 1
read(30, "n", 1)                        = 1
read(30, "m", 1)                        = 1
read(30, "e", 1)                        = 1
read(30, "n", 1)                        = 1
read(30, "t", 1)                        = 1
read(30, " ", 1)                        = 1
read(30, "i", 1)                        = 1
Comment 1 Roman Eisele 2012-11-17 19:48:45 UTC
@ Michael Meeks:
Could you please take a look at this report by Lionel Elie Mamane? The current behaviour seems really suboptimal -- could this be a bottleneck which allows heavy optimizations of startup time? If yes, you will best know which developer could handle this, or if this should be marked as EasyHack, etc.

Thanks!
Comment 2 Michael Meeks 2012-11-19 10:11:11 UTC
Ah - I suspect you're using GNOME 3.6 and/or have Accessibility enabled.

This kills our startup time by about a factor of five; A callgrind trace of what is going on is here:

http://users.freedesktop.org/~michael/callgrind-startup-master.txt.gz

If you gzip -d that and poke at it in kcachegrind you'll see that much of the time is spent fooling around with help with XPaths trying to extract nonsense from them.

Of course - it the real cost is in some stupid syscall-ness ... perhaps it is easy to band-aid ;-) - if so we should.
Comment 3 Michael Meeks 2012-11-19 10:12:34 UTC
IIRC Markus was looking at this, wrt. re-working the help packaging somehow.
Comment 4 Lionel Elie Mamane 2012-11-19 10:43:41 UTC
(In reply to comment #2)
> Ah - I suspect you're using GNOME 3.6 and/or have Accessibility enabled.

Not using GNOME. Jumped off that train with the pain of GNOME 3.x; using xfce now. Briefly using GNOME3, maybe I filed the bug back then, but I don't think so.

Not sure about accessibility, I'm sure I never touched those options:

Tools/Options/LibreOffice/Accessibility:

 UNchecked: Use text selection cursor in read-only text documents
 CHECKED:   allow animated graphics
 CHECKED:   allow animated text
 CHECKED:   help tips disappear after            4 seconds

 CHECKED:   automatically detect high contrast mode of operating system
 UNchecked: use automatic font color for screen display
 CHECKED:   use system colors for page previews
Comment 5 Michael Meeks 2012-11-19 11:16:51 UTC
Ok - then perhaps it is something stupider ;-) on the other hand - I'm not aware of any other reason why we'd be reading help files on startup :-)

Poke at dconf-editor - is the key:

org/gnome/desktop/interface/toolkit_accessibility

checked ? :-)

Failing that - if it is -really- slow, just run in gdb; hit ctrl-c and get some backtraces to see what is going on; and/or do some 'finish' stuff up the traces to see where the slowness is.
Comment 6 Lionel Elie Mamane 2012-11-19 11:29:48 UTC
(In reply to comment #5)

> Poke at dconf-editor - is the key:

> org/gnome/desktop/interface/toolkit_accessibility

> checked ? :-)

No, it is not.

> Failing that - if it is -really- slow, just run in gdb; hit ctrl-c and get
> some backtraces to see what is going on; and/or do some 'finish' stuff up
> the traces to see where the slowness is.

I'll try to find some time later.
Comment 7 Markus Mohrhard 2012-11-19 11:33:56 UTC
We also read the extended tooltips in some other cases when we should not.

This should become all solved as soon as we can move them into our source tree out of the help tree. I already have a script that moves all the extended tooltips into a src file which should dramatically increase the read performance with it. There are sadly still a few problems with mapping uno command ids to the integers and changing the help id entry in our src files to store an id instead of string.
Comment 8 retired 2013-03-22 10:05:31 UTC
@Marcus: Any news on this?

It's NEEDINFO but that has never been provided. Not sure if it's even valid for LO 4. Could someone of the dev-team have a peak?
Comment 9 Markus Mohrhard 2013-03-22 10:07:54 UTC
(In reply to comment #8)
> @Marcus: Any news on this?
> 
> It's NEEDINFO but that has never been provided. Not sure if it's even valid
> for LO 4. Could someone of the dev-team have a peak?

Everybody involved in this discussion is a developer. My plan is to finish the work on the tooltips for 4.1 but right now I'm stuck in bug fixing.
Comment 10 Stephan Bergmann 2013-03-22 10:26:27 UTC
(In reply to comment #5)
> Ok - then perhaps it is something stupider ;-) on the other hand - I'm not
> aware of any other reason why we'd be reading help files on startup :-)

If this reading of help files was observed on the first start after an "upgrade" (i.e., change of buildid, i.e., git rev), it could have been due to what is meanwhile fixed with <http://cgit.freedesktop.org/libreoffice/core/commit/?id=b23bb8e0de3dbdc2c66c3dedf70dfd318868f76c> "fdo#53009: Compile extension help in gbuild."
Comment 11 Stephan Bergmann 2013-03-22 10:29:15 UTC
(BTW, another source of single-byte read syscalls is the JVM, see the third item in bug 902004 comment 5.)
Comment 12 Stephan Bergmann 2013-03-22 10:30:42 UTC
(In reply to comment #11)
> bug 902004 comment 5

That's <https://bugzilla.redhat.com/show_bug.cgi?id=902004#c5> "very bad performance with E-Porto Add-In für OpenOffice Writer installed."
Comment 13 Lionel Elie Mamane 2013-03-22 15:35:49 UTC
Could not catch my libreoffice-4-0 dev tree doing any significant amount of those "read(,,1)" anymore. I do see a lot of small write(), though. They could be JVM stuff since this pid/tid does stuff like:


25399 open("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/libzip.so", O_RDONLY) = 3
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/resources.jar", {st_mode=S_IFREG|0644, st_size=1053728, ..
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/rt.jar", {st_mode=S_IFREG|0644, st_size=60110444, ...}) = 
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/sunrsasign.jar", 0x7f2c0df59ad0) = -1 ENOENT (No such file
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/jsse.jar", {st_mode=S_IFREG|0644, st_size=525260, ...}) = 
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/jce.jar", {st_mode=S_IFREG|0644, st_size=85261, ...}) = 0
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/charsets.jar", {st_mode=S_IFREG|0644, st_size=3534509, ...
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/netx.jar", 0x7f2c0df59ad0) = -1 ENOENT (No such file or di
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/plugin.jar", 0x7f2c0df59ad0) = -1 ENOENT (No such file or 
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/rhino.jar", {st_mode=S_IFREG|0644, st_size=1127950, ...}) 
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/modules/jdk.boot.jar", 0x7f2c0df59ad0) = -1 ENOENT (No suc
25399 stat("/usr/lib/jvm/java-6-openjdk-amd64/jre/classes", 0x7f2c0df59ad0) = -1 ENOENT (No such file or directo
25399 open("/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/meta-index", O_RDONLY) = 3


25399 write(1, "97", 2 <unfinished ...>
25399 <... write resumed> )             = 2
25399 write(1, " ", 1)                  = 1
25399 write(1, "109", 3 <unfinished ...>
25399 <... write resumed> )             = 3
25399 write(1, " ", 1 <unfinished ...>
25399 <... write resumed> )             = 1
25399 write(1, "101", 3)                = 3
25399 write(1, " ", 1 <unfinished ...>
25399 <... write resumed> )             = 1
25399 write(1, "61", 2 <unfinished ...>
25399 <... write resumed> )             = 2
25399 write(1, " ", 1 <unfinished ...>
25399 <... write resumed> )             = 1
25399 write(1, "79", 2 <unfinished ...>
25399 <... write resumed> )             = 2
25399 write(1, " ", 1 <unfinished ...>
25399 <... write resumed> )             = 1
25399 write(1, "112", 3 <unfinished ...>
25399 <... write resumed> )             = 3
25399 write(1, " ", 1 <unfinished ...>
25399 <... write resumed> )             = 1
25399 write(1, "101", 3 <unfinished ...>
25399 <... write resumed> )             = 3
25399 write(1, " ", 1)                  = 1
25399 write(1, "110", 3)                = 3
25399 write(1, " ", 1)                  = 1
25399 write(1, "74", 2)                 = 2
25399 write(1, " ", 1)                  = 1
25399 write(1, "68", 2)                 = 2
25399 write(1, " ", 1)                  = 1
25399 write(1, "75", 2)                 = 2
25399 write(1, " ", 1)                  = 1
25399 write(1, "32", 2)                 = 2
25399 write(1, " ", 1)                  = 1
25399 write(1, "82", 2)                 = 2
25399 write(1, " ", 1)                  = 1
25399 write(1, "117", 3)                = 3
25399 write(1, " ", 1)                  = 1
25399 write(1, "110", 3)                = 3
Comment 14 QA Administrators 2013-11-04 22:18:09 UTC
Dear Bug Submitter,

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team
Comment 15 Lionel Elie Mamane 2013-11-05 07:21:11 UTC
Could not reproduce anymore, I don't think anybody is going to work on this anymore (because no precise reproductibility) -> closing