Bug 81541 - VIEWING: Primary Key Search Slow
Summary: VIEWING: Primary Key Search Slow
Status: RESOLVED DUPLICATE of bug 44954
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
4.1.4.2 release
Hardware: All Linux (All)
: medium minor
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on: 44954 57872
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-19 15:51 UTC by Marino Capriotti
Modified: 2014-07-22 15:05 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
8000+ records database (992.13 KB, application/vnd.oasis.opendocument.database)
2014-07-20 18:26 UTC, Marino Capriotti
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marino Capriotti 2014-07-19 15:51:34 UTC
Problem description:

I've got a table with 8000+ records, search is unbearably slow despite indexing unless I create a query that excludes the primary key. However I cannot use this query in my form because leaving the primary key out prevents me from creating new records.

I surmise Java may be responsible of this slowdown: the version in use is 1.7.0_55.

Can someone please suggest a workaround?
Operating System: Ubuntu
Version: 4.2.4.2 release
Comment 1 Joel Madero 2014-07-19 20:36:26 UTC
We'll need a sample file in order to do anything. Please also provide exact steps to reproduce. Once you provide a sample please mark as UNCONFIRMED. 

I know that it might be hard because of confidential information, my recommendation is to strip out the data by doing a replace for all characters to scramble the words
Comment 2 Marino Capriotti 2014-07-20 18:26:07 UTC
Created attachment 103152 [details]
8000+ records database

Hello Joel

Privacy is definitely not an issue here.

Steps to reproduce:

1. Go to Forms>Inventario
2. Select the first record on the table-like form on the left
3. Click on the Find button
4. Select options Text/Single Field>Codice/Beginning of Field
5. Perform the search of "z"

Hope it's enough!
Comment 3 Joel Madero 2014-07-20 18:58:59 UTC
Perfect! One additional question - how can I create a query in the form that leaves out the primary key? I want to compare times for the query.

I did the steps you described and it did seem pretty slow:
Ubuntu 14.04 x64
LibreOffice 4.3.0.2 rc
System: 2.4ghz Duo Core 2, 4 GB RAM

Time to do the find - 45.25 seconds.

I'd like to compare this to the result without primary key included but not sure how to do this (thanks for working with me, I don't use base much at all).
Comment 4 Marino Capriotti 2014-07-20 20:18:56 UTC
H(In reply to comment #3)
> Perfect! One additional question - how can I create a query in the form that
> leaves out the primary key? I want to compare times for the query.
> 
> I did the steps you described and it did seem pretty slow:
> Ubuntu 14.04 x64
> LibreOffice 4.3.0.2 rc
> System: 2.4ghz Duo Core 2, 4 GB RAM
> 
> Time to do the find - 45.25 seconds.
> 
> I'd like to compare this to the result without primary key included but not
> sure how to do this (thanks for working with me, I don't use base much at
> all).

No need for you to create a new query. Just use the query called "Ricerca inventario", since it doesn't include the "ID". The form as I set it up directly accesses the table, then sorts it according to field "Codice".

Thanks for helping me out!
Comment 5 Marino Capriotti 2014-07-20 20:21:07 UTC
Last but not least: if you include the "ID" to the query, then all performance boost is inexorably lost, try it for yourself!
Comment 6 Joel Madero 2014-07-21 01:03:06 UTC
Thanks - I've confirmed the problem at least on:
Ubuntu 14.04 x64
LibreOffice 4.1.4.2 - updating version (version is the oldest version that we can confirm the problem, we use comments to say "the problem still exists on x.x.x.x)


New - confirmed
Minor - while it's quite annoying, it does not prevent high quality work, but it can slow you down a lot
Medium - slow enough to move it from low to medium, keep in mind that base is our least used component


CC'ing Lionel who might have further opinions here.
Comment 7 Lionel Elie Mamane 2014-07-21 05:13:17 UTC
In my tests the speed difference does not seem to be linked to the primary key field per se, but more to the number of fields in the query, primary key or not. E.g. search in the query
 SELECT "ID", "Codice" FROM "Inventario" ORDER BY "Codice" ASC
is much faster.

IMO the cause is that LibreOffice downloads the whole row to move through the query/table, even if only for search. Because it goes through a "caching" layer. And search is done by downloading the data and doing the search ourselves.

One "easy" way to improve this somewhat, but not fundamentally, would be to make the caching layer more lazy, or have the search bypass the cache, or something like that.

However, when we do have the primary key in the query, it is silly to do the search ourselves, while (as smoglondon hints) the database will do it much faster, especially when the searched field is indexed. Compare the speed at which these queries execute, compared to the speed the LibreOffice search takes:
 SELECT * FROM "Inventario" WHERE "Codice" LIKE 'z%'
 SELECT COUNT(*) FROM "Inventario" WHERE "Codice" < 'z'
That's bug 44954

*** This bug has been marked as a duplicate of bug 44954 ***
Comment 8 Marino Capriotti 2014-07-21 12:37:50 UTC
(In reply to comment #7)
> In my tests the speed difference does not seem to be linked to the primary
> key field per se, but more to the number of fields in the query, primary key
> or not. E.g. search in the query
>  SELECT "ID", "Codice" FROM "Inventario" ORDER BY "Codice" ASC
> is much faster.
> 
> IMO the cause is that LibreOffice downloads the whole row to move through
> the query/table, even if only for search. Because it goes through a
> "caching" layer. And search is done by downloading the data and doing the
> search ourselves.
> 
> One "easy" way to improve this somewhat, but not fundamentally, would be to
> make the caching layer more lazy, or have the search bypass the cache, or
> something like that.
> 
> However, when we do have the primary key in the query, it is silly to do the
> search ourselves, while (as smoglondon hints) the database will do it much
> faster, especially when the searched field is indexed. Compare the speed at
> which these queries execute, compared to the speed the LibreOffice search
> takes:
>  SELECT * FROM "Inventario" WHERE "Codice" LIKE 'z%'
>  SELECT COUNT(*) FROM "Inventario" WHERE "Codice" < 'z'
> That's bug 44954
> 
> *** This bug has been marked as a duplicate of bug 44954 ***

Hello Lionel Elie Mamane

Concerning what’s stored in memory, I knew about that and that’s why I devised a macro that presses the Last Record and then First Record button before loading the form: this way all 8000+ records are stored in memory from the get-go. If you substitute direct-access with the “Ricerca inventario” query in the main form, you’ll notice that searches are instantaneous. Please remember though that, as I said, inserting “ID” into the aforementioned query nullifies all performance boost.

Last but not least, why have you set this bug to RESOLVED? You didn't provide me with any workaround, at the very least. The slowness makes my database all but unusable.
Comment 9 Marino Capriotti 2014-07-21 12:44:37 UTC
I repeat it: the "ID" although apparently indexed, must be up to something because (at least on my PC) it irretrievably slows down searches.
Comment 10 Lionel Elie Mamane 2014-07-21 14:25:22 UTC
(In reply to comment #8)

> If you substitute direct-access with the “Ricerca inventario”
> query in the main form, you’ll notice that searches are instantaneous.

No, I don't see that. Search is about the same speed, not much difference.

> Concerning what’s stored in memory, I knew about that and that’s why I
> devised a macro that presses the Last Record and then First Record button
> before loading the form: this way all 8000+ records are stored in memory
> from the get-go.

Ah, I understand now. You don't immediately do a search, you first move to the end, and only then do a search.

Case 1: In the case of no Primary Key, moving to the end is slow, but all *subsequent* searches are fast.

Case 2: In the case of Primary Key present, moving to the end is fast, but all searches are slow (about as slow as moving to the end in case 1).


In the two cases, LibreOffice uses different strategies:

Case 1: LibreOffice caches all information in memory. Retrieving it is slow (because of bug 57872), but then all is in LibreOffice memory so everything goes relatively fast, at least for relatively small datasets like yours.

Case 2: LibreOffice caches the primary key of all rows in memory, and the full content of about twice the amount of rows that the screen displays. Any movement out of the cache (as is done for a search) is "slow" (but not as much as a full table/query walk!) because of bug 57872. Resolving bug 44954 would make search fast, as it would not require moving throught the whole query/table.

What you are descriging it the combined effects of bug 57872 and bug 44954.

> Last but not least, why have you set this bug to RESOLVED?

I set it to "DUPLICATE", which means that what you describe is actually the *same* as bug 44954. I'm sorry that our bug tracking system confusingly calls that "RESOLVED", since that is effectively not a resolution of the problem. It just says "this bug entry, that is bug 81541 is nothing new, everything is in bug 44954. So to see the status of the actual problem encountered, go see bug 44954."

> You didn't provide me with any workaround, (...).

You will not feel this issue in an as painful way (on that size of dataset) in a non-JDBC database (but any other driver: native, ODBC, ADO, ...). Embedded HSQLDB is a JDBC database. Whether you can move to a non-JDBC database depends on your use case.
Comment 11 Marino Capriotti 2014-07-21 14:46:48 UTC
(In reply to comment #10)
> (In reply to comment #8)
> 
> > If you substitute direct-access with the “Ricerca inventario”
> > query in the main form, you’ll notice that searches are instantaneous.
> 
> No, I don't see that. Search is about the same speed, not much difference.
> 
> > Concerning what’s stored in memory, I knew about that and that’s why I
> > devised a macro that presses the Last Record and then First Record button
> > before loading the form: this way all 8000+ records are stored in memory
> > from the get-go.
> 
> Ah, I understand now. You don't immediately do a search, you first move to
> the end, and only then do a search.
> 
> Case 1: In the case of no Primary Key, moving to the end is slow, but all
> *subsequent* searches are fast.
> 
> Case 2: In the case of Primary Key present, moving to the end is fast, but
> all searches are slow (about as slow as moving to the end in case 1).
> 
> 
> In the two cases, LibreOffice uses different strategies:
> 
> Case 1: LibreOffice caches all information in memory. Retrieving it is slow
> (because of bug 57872), but then all is in LibreOffice memory so everything
> goes relatively fast, at least for relatively small datasets like yours.
> 
> Case 2: LibreOffice caches the primary key of all rows in memory, and the
> full content of about twice the amount of rows that the screen displays. Any
> movement out of the cache (as is done for a search) is "slow" (but not as
> much as a full table/query walk!) because of bug 57872. Resolving bug 44954
> would make search fast, as it would not require moving throught the whole
> query/table.
> 
> What you are descriging it the combined effects of bug 57872 and bug 44954.
> 
> > Last but not least, why have you set this bug to RESOLVED?
> 
> I set it to "DUPLICATE", which means that what you describe is actually the
> *same* as bug 44954. I'm sorry that our bug tracking system confusingly
> calls that "RESOLVED", since that is effectively not a resolution of the
> problem. It just says "this bug entry, that is bug 81541 is nothing new,
> everything is in bug 44954. So to see the status of the actual problem
> encountered, go see bug 44954."
> 
> > You didn't provide me with any workaround, (...).
> 
> You will not feel this issue in an as painful way (on that size of dataset)
> in a non-JDBC database (but any other driver: native, ODBC, ADO, ...).
> Embedded HSQLDB is a JDBC database. Whether you can move to a non-JDBC
> database depends on your use case.

Thanks for setting things straight. Looks like I got myself into hot water, you only leave the window of migration open.  Could you please reccomend a migration strategy? I also heard about Firebird but as far as I've been told, it's not available for Ubuntu yet.

Regards
Comment 12 Joel Madero 2014-07-21 14:48:21 UTC
@Lionel - I don't think I experienced it quite like you said when I duplicated the bug:

1. Searching the one without the ID it was indeed slow first, and then fast

2. Searching with ID - it was never fast, not the first, nor the last time I tested. The difference is between instantaneous for the first case (no ID) and ~45 seconds with the key (so a magnitude of speed several hundred times faster for essentially the same data).

Just wanted to add my experience. Thanks Lionel for commenting
Comment 13 Lionel Elie Mamane 2014-07-21 16:54:48 UTC
(In reply to comment #12)
> @Lionel - I don't think I experienced it quite like you said when I
> duplicated the bug:
> 
> 1. Searching the one without the ID it was indeed slow first, and then fast

Yes, you experience the same than I do.

> 2. Searching with ID - it was never fast, not the first, nor the last time I
> tested.

Yes, you experience the same than I do.

Essentially, retrieving the data is what is slow (because JDBC / bug 57872). Without primary key, LibreOffice retrieves the data only once. With primary key, LibreOffice trusts it can get the data again and thus caches only a few dozens of rows. Because LibreOffice fetches the whole data to search (instead of having the search done by the database, that's bug 44954), search in a JDBC database (with primary key) is horribly slow.

"Fixing" either of these bugs would make this particular scenario a non-issue.
Comment 14 Marino Capriotti 2014-07-21 16:57:29 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > @Lionel - I don't think I experienced it quite like you said when I
> > duplicated the bug:
> > 
> > 1. Searching the one without the ID it was indeed slow first, and then fast
> 
> Yes, you experience the same than I do.
> 
> > 2. Searching with ID - it was never fast, not the first, nor the last time I
> > tested.
> 
> Yes, you experience the same than I do.
> 
> Essentially, retrieving the data is what is slow (because JDBC / bug 57872).
> Without primary key, LibreOffice retrieves the data only once. With primary
> key, LibreOffice trusts it can get the data again and thus caches only a few
> dozens of rows. Because LibreOffice fetches the whole data to search
> (instead of having the search done by the database, that's bug 44954),
> search in a JDBC database (with primary key) is horribly slow.
> 
> "Fixing" either of these bugs would make this particular scenario a
> non-issue.

Hi Lionel

"Fixing" doesn't seem to be much of a priority since both bugs are quite old harking back as they do to 3.x versions.

Any workaround before I'm forced to give up?

Thanks!
Comment 15 Lionel Elie Mamane 2014-07-21 17:02:55 UTC
> Could you please recommend a migration strategy?
> I also heard about Firebird but as far as I've been
> told, it's not available for Ubuntu yet.

I don't consider Firebird as embedded DB in LibreOffice as mature. OTOH, the version of HSQLDB in LibreOffice is not that good either, so <shrug> maybe some users would prefer Firebird anyway.

If it does not conflict with your use case, I think your best bet is an *external* database. Any decent one. Can be Firebird (as an _external_ database, e.g. through ODBC driver or some so), can by MySQL, can be PostgreSQL, ... Avoid sqlite. But since it is external, it means handling an external server (can be on the same computer) and the .odb as separate file.
Comment 16 Marino Capriotti 2014-07-21 19:15:11 UTC
Hi Lionel

Apart from migration, no workaround/quickfix?
Comment 17 Lionel Elie Mamane 2014-07-22 07:18:08 UTC
(In reply to comment #16)
> Apart from migration, no workaround/quickfix?

None worth their salt comes to mind. Do the search in the query and use that to move the current record in the form? Reimplement the search in a big "manually cached" "whole table"? I'm not sure either of those would be that much less work than just fixing bug 44954 (or possibly bug 57872 *if* one of the most crazy ideas in there actually works, which is a big "if").
Comment 18 Marino Capriotti 2014-07-22 15:05:37 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Apart from migration, no workaround/quickfix?
> 
> None worth their salt comes to mind. Do the search in the query and use that
> to move the current record in the form? Reimplement the search in a big
> "manually cached" "whole table"? I'm not sure either of those would be that
> much less work than just fixing bug 44954 (or possibly bug 57872 *if* one of
> the most crazy ideas in there actually works, which is a big "if").

OK, so, how do I migrate to ODBC/whatever?

Thanks