Bug 38811 - default to SQLite not HSQLDB in Base
Summary: default to SQLite not HSQLDB in Base
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillCpp, skillSql
Depends on: 51781
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-30 06:57 UTC by Björn Michaelsen
Modified: 2015-12-15 23:25 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Possible starting point (250.31 KB, patch)
2011-11-22 06:49 UTC, Lionel Elie Mamane
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Michaelsen 2011-06-30 06:57:45 UTC
Background: currently when creating a local database we default to this Java database engine. That is somewhat unfortunate. It would be better to default to a SQLite provider - unfortunately we don't have one. So - it needs writing (see connectivity/source/drivers/*) - and then the default changing. See [http://www.sqlite.org/c3ref/intro.html The SQLite C/C++ Interface].

Skills: building, C++ hacking
Comment 1 gleppert 2011-07-09 10:24:41 UTC
Why changing the default database for LibreOffice? Is there any reason (besides being based on Java) for SQLite over HSQLDB version 2? Comparing the features, I don't see any reason. HSQLDB even seems to be faster in certain situations.
Comment 2 Nguyen Vu Hung 2011-08-16 21:51:11 UTC
+1 for SQLite - a popular DB engine
Comment 3 Don't use this account, use tml@iki.fi 2011-08-17 02:35:24 UTC
No need to "vote". We want code, not votes.
Comment 4 Andreas Säger 2011-09-30 09:22:42 UTC
This "easy hack" is pointless. Replacing one very good product with another very good product will not solve a single Base problem.
Implementing http://www.sqlite.org/c3ref/intro.html might be insufficient.
Comment 5 Anonymous Helper 2011-11-16 08:21:15 UTC
One argument for SQLite over HSQLDB is that it's much, much easier to integrate with other software. It's the NetBSD of DB engines (it runs everywhere).

I've worked with a dozen or so DB engines over the years, integrating data sources and building ETL systems, and this bug report was the first time I've run across HSQLDB (likely because I don't use Java). 

After a few Google searches, I wouldn't consider HSQLDB's integration with other languages very nice. They all seem to require a JVM process and a JDBC connector rather than having a native implementation available. Ruby libs (outside of JRuby) spin up a JVM process to call the java libraries. Perl doesn't have a native DBI back end, it uses DBD::JDBC and requires a JDBC server process. Python appears to use a JVM+JDBC connection. Same for C/C++. 

SQLite typically has a small, native C implementation or binding available for each language, and you're far more likely to run into a desktop application that can read SQLite than HSQLDB. And it looks like java already has pretty good pure java SQLite support through things like the SQLiteJDBC connector (http://www.zentus.com/sqlitejdbc/).
Comment 6 Mohsin Mansur 2011-11-22 00:05:56 UTC
I am interested in writing the driver code. However I have some questions (that a fellow Base driver developer might be able to help me resolve) with regards to what actually needs to be done, and how it needs to be done. Keep in mind that I am completely new to LibreOffice C++ UNO coding, and there doesn't seem to be any guide for Base driver writing or using UNO interfaces.

* What classes/services/interfaces at a minimum have to be implemented? I browsed through the drivers tree and it appears that not all drivers implement all services (except for some basic ones such as tables and columns). How does one decide what to implement and what not to implement?

* Are the code skeletons for the drivers usually auto-generated or written by hand? There is some documentation for generating code skeletons from UNOIDL files on the LibreOffice SDK API page but I don't know if that's relevant to this case or not (once again, driver developers can let me know what they did).

* Are there any existing unit tests for testing Base functionality with new drivers? If not, how were the existing drivers tested?

* (a simple one) How will I set the default driver to SQLite in the source code?

Thanks in advance.
--Mohsin

(I'll be sending this message to the mailing list as well)
Comment 7 Lionel Elie Mamane 2011-11-22 06:49:00 UTC
Created attachment 53769 [details]
Possible starting point
Comment 8 Lionel Elie Mamane 2011-11-22 06:50:45 UTC
There is also an "alpha quality" driver at http://dba.openoffice.org/drivers/sqlite/index.html

I'm not sure which of "possible starting point" attachment or version 0.1.0 from webpage is more recent.
Comment 9 Lionel Elie Mamane 2011-11-22 07:09:36 UTC
(In reply to comment #6)
> There doesn't seem to be any guide for Base driver writing or using UNO interfaces.

The situation is not as bad as you say; there is the OpenOffice.org Developer's Guide at http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/OpenOffice.org_Developers_Guide, and in particular these sections:

 * Writing Database Drivers: http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Database/Writing_Database_Drivers
 * About UNO:
     http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/ProUNO/Professional_UNO
     http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/WritingUNO/Writing_UNO_Components
     http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/AdvUNO/Advanced_UNO

It seems you did not find the skeleton provided? It is in the SDK documentation, and it has a file named "How_to_write_my_own_driver.txt" with a more hands-on approach than the Developer's Guide, which gives more a hihg-level overview and reference. In a git checkout of the source tree, it is in odk/examples/DevelopersGuide/Database/DriverSkeleton

Also gleaned:

 * http://help.libreoffice.org/Common/Integrating_new_UNO_components

> * What classes/services/interfaces at a minimum have to be implemented?

See http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Database/SDBC_Driver for a functional minimum. After, you can add SDBCX services which provide added features.

> * Are the code skeletons for the drivers usually auto-generated or written by
> hand?

The Database Driver Skeleton in the SDK seems to be written by hand, as it is "as is" in the git sources.

> * Are there any existing unit tests for testing Base functionality with new
> drivers? If not, how were the existing drivers tested?

I don't know.

> * (a simple one) How will I set the default driver to SQLite in the source
> code?

Chase for "embedded" (and especially EMBEDDED_HSQLDB) in connectivity and dbaccess directories of the source tree. That's the points where it currently does something "special" for embedded HSQLDB, and where you'll have to do something similar for embedded SQLite DB.
Comment 10 Lionel Elie Mamane 2011-11-22 07:38:12 UTC
Let me introduce myself. I've started taking care of the PostgreSQL driver for LibreOffice. That was also my first sizeable contribution to LibreOffice. As such, I think:

 - I can be useful for you in the "bootstrap" phase sharing the things I learned while doing the PostgreSQL driver, although I would not call myself an expert yet.

 - It would be useful for both us if we stayed in touch and shared experiences / what we learn with each other.

I was rather lucky in that there was a "basically working / beta quality" driver for PostgreSQL for OpenOffice.org already, that was actively used in Debian&Ubuntu, and I basically handled bitrot (updating it to new OO.org / Libreoffice version) and added a few "easy" features and optimisations I liked or wanted.

For SQLite, the existing thing is described as "alpha quality", so you'll need to look at it and make a decision whether it is better to start from it or start from scratch "cleanly".

As to PostgreSQL-specific knowledge (like "how to list a table's columns and their datatype"), I've found it useful to just look at how the ODBC and JDBC driver do things, and how the "psql" command-line interface program does things. For that, you could use http://www.ch-werner.de/sqliteodbc/ and/or http://www.ch-werner.de/javasqlite/ and/or the source code of the "sqlite3" command-line program. But maybe you are already a SQLite expert and don't need that.

So, that's it for now, don't hesitate to email me directly if you need something.

One last thing: once you have decided to do this, put the bug's status to "ASSIGNED" and put yourself into "assigned to". If you change your mind in future and move to other things, put the bug back in "NEW" state and "reset assignee to default".

Welcome to the community and happy hacking!
Comment 11 Mohsin Mansur 2011-11-22 17:27:03 UTC
(In reply to comment #9)
> (In reply to comment #6)
> > There doesn't seem to be any guide for Base driver writing or using UNO interfaces.
> 
> The situation is not as bad as you say; there is the OpenOffice.org Developer's

Thanks for that! I think I will focus on trying to improve the "alpha" driver and discuss possible ways of resolving the issues its author has mentioned with him.

--Mohsin
Comment 12 Michael Meeks 2011-11-23 04:17:04 UTC
> Thanks for that! I think I will focus on trying to improve the "alpha" driver
> and discuss possible ways of resolving the issues its author has mentioned
> with him.

Great news; licensing wise we can copy Lionel's approach with postgresql, and try to incrementally re-write the code to MPL/LGPLv3+ over time.

Anyhow - excited to see someone working on this, it's a key strategic piece for LibreOffice ! do ask out on IRC or the mailing list with any questions. Thanks.
Comment 13 enrico.weigelt 2012-03-17 08:08:31 UTC
(In reply to comment #1)
> Why changing the default database for LibreOffice? Is there any reason (besides
> being based on Java) for SQLite over HSQLDB version 2? Comparing the features,
> I don't see any reason. HSQLDB even seems to be faster in certain situations.

Assuming HSQLDB is run in JVM (instead of compiled into machine code),
sqlite is ways faster.

BUT: is this really a good way to do ?
Wouldn't moving to some generic database wrapper library (eg. uODBC)
and dropping the own wrappers completely, the better way ?
Comment 14 Lionel Elie Mamane 2012-03-19 00:04:42 UTC
(In reply to comment #13)

> BUT: is this really a good way to do ?
> Wouldn't moving to some generic database wrapper library (eg. uODBC)
> and dropping the own wrappers completely, the better way ?

For the special case of the (default) embedded database (meaning the data is in the .odb file, not "elsewhere"), using UnixODBC has real disadvantages:

 - Dependency on an extra "external" system -> less "bootstrap friendly" for the user.

 - Not only UnixODBC has to be installed, but also the right ODBC driver

 - UnixODBC & driver not only need to be installed, but also configured (odbcinst.ini) and the DSN configured (odbc.ini).

This all combines to: one cannot just mail an ODB file to another user and he/she can use it with just double-clicking on it.

Additionally:

 - I'm actually not aware of a way to embed the database within the .odb file through ODBC; maybe that would be possible? Would we need to create a DSN (in odbc.ini) for *each* odb file, and update it each time the file is moved / copied? That would be rather inconvenient for the user.


But if you have an idea of how to make that work, I am / we are listening.
Comment 15 dacm 2012-04-09 16:38:16 UTC
I'm not sure what's really behind this push for SQLite as the default engine in Base, but I can assure you that this move is viewed as a significant step backwards in the Base support-community. The issues mentioned here are completely irrelevant to Base users, and likely won't ease the development of LibO Base. The average Base user is not informed enough to assess all options before beginning a Base project, so the default must be flexible enough to meet all of their expectations (ala MS Access). HSQLDB 2.x provides that flexibility. SQLite does not.

Base is enhanced by the SQL standards-compliance found with HSQLDB as this speeds the learning process. HSQLDB 2.x provides an excellent SQL features-set, adequate for virtually all Base projects. And we find that many users actually expect multi-user scalability as available with HSQLDB. These are all lost with SQLite which bucks SQL-standards, lacks many SQL features, isn't scalable, and has fallen woefully behind HSQLDB 2.x in terms of features and development pace. These inadequacies will only widen and cause immediate support-headaches as users are forced into tedious data-migration simply to achieve what HSQLDB 2.x offers out-of-the-box. 

Performance is not the primary issue as HSQLDB compares very closely with SQLite in terms of speed. Besides, H2 (cousin of HSQLDB) has proven faster than SQLite in read performance, so I suspect that Java's JIT compiler is actually quite efficient. 
http://h2database.com/html/tutorial.html#android

All Base functions are affected by the power of the database engine including Queries, Forms and Reports. For example, Base Forms are highly dependent on the SQL feature-set supplied by the underlying database engine. The more functional the engine, the more functional Forms can become. We already have users that upgrade from the default HSQLDB 1.8 engine to HSQLDB 2.x for various functions such as DATE MATH, GROUP_CONCAT, and user-defined CHECK constraints. This is a relatively simple task given the upgrade automation provided by HSQLDB 2.x for legacy HSQLDB 1.8 databases. But as most users start with the default, they'll be much more quickly disappointed with SQLite, which starts with less function than the current default and then requires tedious data-migration to adopt a more comprehensive engine like HSQLDB 2.x!

Base users currently have a default database engine that scales from single-user in the seamless 'embedded' mode, to multi-user in 'server mode' -- without engaging in data migration. This is a key feature among users making the move from MS Access. Why would we want to give up this feature to adopt SQLite? 

Is this really about ease of development for LibreOffice programmers? I doubt it. The necessary hooks for JDBC compatibility are already available in LibreOffice. And the significant work on HSQLDB 2.x integration is already complete. I suspect that the "unfortunate" role of Java in this discussion has more to do with sour-grapes/politics (Oracle/Sun) than on the purported reach of LibreOffice Base into all relevant platforms. If you're actually targeting a platform that doesn't support Java, then it's probably irrelevant to the vast majority of users. Why dumb-down a product for the exclusive benefit of a handful of users? And I suspect that Java has been implemented much more uniformly across supported platforms than the native code written by SQLite developers, as mentioned by a previous poster in this comment-thread. 

BTW, H2 would be another excellent choice as the Base default. The emphasis on HSQLDB 2.x here simply recognizes the momentum and the work already done on HSQLDB 2.x integration. Both of these Java engines have the footprint, features, scalability and speed necessary to maximize Base through the default engine.

In summary, SQLite as the default engine will effectively reduce the function and flexibility of Base projects with particular impact on Queries, Forms and Reports while eliminating the expected (migration-free) path into multi-user environments such as MS Access provides, all due to the lack of features and function of SQLite relative to HSQLDB 2.x. It just doesn't make any sense.
Comment 16 Björn Michaelsen 2012-04-10 01:49:53 UTC
@dacm: Please keep the metadiscussion of the bug (it would belong on the mailing list or the ESC). As Lionel pointed out the reasons for this step quite well, there are important reasons for this, none of which you could dispel. If you want to improve the H2 bridge of LibreOffice base (or you are interested in funding such an effort) that will be considered. However, even that that would be independent of any work going into SQLite (which this EasyHack is about) and thus be off topic here.
Comment 17 dacm 2012-04-11 01:01:32 UTC
(In reply to comment #16)
@Björn 
I certainly apologize for the off-topic nature of my comments above, and particularly the "sour grapes" comment, which I retract as insignificant to the overall situation. Please feel free to delete any of my inputs (here and there) as necessary. That entire diatribe was crafted specifically to pour cold-water on would-be developers as they consider their role in coding a perceived regression of LO Base. But after a full day of reading the mailing lists, I do understand the desperate situation for LO Base. I will not attempt to dispel the reasoning I encountered because the desperation of the situation leaves most reality-checks untenable. Something as glaring as the dying nature of desktop-bound code (C++) relative to rich internet-capable apps (html5, and the predecessors including Java, Flash and Silverlight) is simply not appropriate for LO discussion today. That's quite understandable.

For those developers that may be interested in this "easy hack" challenge, it is an unfortunate consequence of a rather limited talent pool, that we now find ourselves begging for relief in the form of a Base regression. We certainly prefer to call it an "enhancement" because it will (I'm convinced) pave the way for necessary bug fixes and even improvements to LO Base over the long haul. Because quite frankly, LO Base has been unable to attract and/or retain coders with the necessary skills to tackle C++ and Java combined with an acute knowledge of databases. That's understandable considering that Base is in the (buggy) shape its in today due to the inability of the original Base (Sun) developers to code, document and properly debug this trio. And perhaps particularly so with respect to an in-process (JVM; embedded mode) Java database engine. Reasonable attempts to rectify the situation have even exacerbated the situation (bugs).

So given the realities, it is with grave apprehension as a user and support-volunteer that I grant my own blessings for SQLite, as the default database engine in Base. But only because it has the potential to save Base as a component of LO. I will continue to steer users away from "embedded databases" in production environments in general. And I still believe this dumbed-down, programmer's engine is unworthy of a modern, end-user, front-end like Base -- such that LO would be better-off disabling the 'embedded database' option with Base than to embrace such functional regression. But I must admit, the average user will never even use Base without a default (bundled) engine, and perhaps even an all-in-one .odb database option for smaller mail-merge type projects. Eventually, I think it would be advantageous to offer a portable 'split' file configuration in the name of stability, because single-file stability is an issue even with the vast resources of MS Access 2010 -- which is replete with official language warning about the stability of a single-file database -- so much so that it sports a feature to 'split' the components into separate front-end and back-end files "for proper stability." 

Despite the unbelievable ignorance I've encountered today in the LO mailing lists among devs, designers, documenters and steering-committee leadership with respect to the presumed adequacy of SQLite relative to HSQLDB 2.x, I do agree that SQLite is a necessary evil in the future of LO Base. So I respectfully join Björn and the LO team in the plea for code as necessary to integrate SQLite as a default Base engine. The future of LO Base "unfortunately" rests in your hands.
Comment 18 Florian Reisinger 2012-05-18 09:47:43 UTC
Deleted "Easyhack" from summary.
Comment 19 Lionel Elie Mamane 2012-06-07 20:43:42 UTC
In the discussion in bug 50575, I discovered that SQLite3 has a rather weak notion of datatype, and uses dynamic typing instead of static typing which underlies the SQL data model. Also, it does not enforce that a column of a certain datatype actually contain values of that datatype.

So e.g. an integer column can actually contain any string.

In the particular example of date or timestamp columns, it does not consider "2011-04-05" and "2011-4-5" as the same date, because timestamp or date is actually a *synonym* (alias) for varchar (string).

So dacm was right, *baaad* idea.
Comment 20 Michael Meeks 2012-06-08 03:13:46 UTC
> In the discussion in bug 50575, I discovered that SQLite3 has a rather weak
> notion of datatype, and uses dynamic typing instead of static typing which
> underlies the SQL data model.

Interesting of course :-) - I guess we're not talking though about importing arbitrary sqlite databases, but having a well confined database inside our own file formats, that we can apply our own constraints to (right?).

In such a situation surely it is possible to layer constraints / strong typing on top of the database in our code ? [ wouldn't we do that by accident / default anyway ]. But I guess it's more work if we have to have extra type annotation information around the place.
Comment 21 Lionel Elie Mamane 2012-06-12 07:25:17 UTC
(In reply to comment #20)

>> In the discussion in bug 50575, I discovered that SQLite3 has a rather weak
>> notion of datatype, and uses dynamic typing instead of static typing which
>> underlies the SQL data model.

> Interesting of course :-) - I guess we're not talking though about importing
> arbitrary sqlite databases, but having a well confined database inside our own
> file formats, that we can apply our own constraints to (right?).

Good luck explaining that to the users; that's however the least of the problems.

> In such a situation surely it is possible to layer constraints / strong typing
> on top of the database in our code ? [ wouldn't we do that by accident /
> default anyway ]. But I guess it's more work if we have to have extra type
> annotation information around the place.

We could do that, but that doesn't give us the right operator semantics on these "subtypes".

In SQL (and in most programming languages), the operators have a different meaning depending on the datatype of the value, and sqlite3 has too few types; only NULL, integer, real (floating-point) text or BLOB. Missing is e.g. any date/time related stuff.

E.g. is "40 > 5" true or false? If "40" and "5" are strings, then it is false; if they are integers, then it is true. sqlite3 properly supports the distinction between integers and strings, so these examples are handled correctly.

But sqlite has no notion of date or time values per se. Only as strings or integers or floats. Hence the problems as in bug 50575, where the query "SELECT * FROM table WHERE column='constant_value'" does not find the row because that equality is interpreted as string equality (where 0 <> 0.0 ) instead of timestamp equality (where 0 = 0.0). Also consider "WHERE column < '18:12'". As a timestamp, '18:12' is '2000-01-01 18:12:00':

sqlite> SELECT datetime('18:12');
2000-01-01 18:12:00

But if a row in the column contains '1900-05-09', that test will evaluate to false and the row not returned, while 'it should' evaluate to true.

We *could* theoretically, when using sqlite, basically wrap each time, date, etc value into a "julianday()" call to force date/time handling, as when the user asks for this query:

SELECT * FROM table WHERE column < '18:12'

we see that column is date/time/timestamp and instead issue to SQLite3:

SELECT * from table WHERE julianday(column)  < julianday('18:12')

1) That's a much deeper analysis of the query than we do now, and heavy special-casing. Beh.
2) If you think it is feasible, what about this query:

    SELECT col2, MAX(column) as maxcolumn FROM table GROUP BY col2 WHERE strftime("%w", min(column, '18:12')) = '1'

    This is supposed to select, for each distinct value of col2, the maximal value of column that is a Monday among those that are earlier than '2000-01-01 18:12:00'.

  You *can* get this correctly using sqlite, with something like:


    SELECT col2, datetime(MAX(julianday(column))) as maxcolumn FROM table GROUP BY col2 WHERE strftime("%w", min(julianday(column), julianday('18:12'))) = '1'

   Note the conversion back to datetime string after we are done computing.

  Still think one can do that? Now consider:

    SELECT col2, MAX(column) as maxcolumn FROM table GROUP BY col2 WHERE strftime("%Y-%m-%d", min(column, '18:12')) = '00:00'

   Now, to get this correct:

    SELECT col2, datetime(MAX(julianday(column))) as maxcolumn FROM table GROUP BY col2 WHERE strftime("%Y-%m-%d", min(julianday(column), julianday('18:12'))) = date('00:00')

   or

    SELECT col2, datetime(MAX(julianday(column))) as maxcolumn FROM table GROUP BY col2 WHERE julianday(strftime("%Y-%m-%d", min(julianday(column), julianday('18:12')))) = julianday('00:00')


   Note that depending on the format given by strftime, we need to decide whether this is a time/date or not. Also consider


    SELECT col2, datetime(MAX(julianday(column))) as maxcolumn FROM table GROUP BY col2 WHERE strftime('just a constant', min(julianday(column), julianday('18:12'))) = 'just a constant'
Comment 22 Lionel Elie Mamane 2012-06-14 08:37:31 UTC
So, starting to look at alternatives to sqlite3, for now our best shot seems to be Firebird.

Firebird: MPL-style licensing, first cursory glance OK featureset

The non-solutions:
MySQL/MariaDB/... : can be embedded, but GPL
CSQL: GPL
MonetDB: introduced embedded mode in 2004, but dropped it as of 2011 (source says it was not a true embedded mode anyway, only a single client server)
Comment 23 Julien Nabet 2012-06-14 15:26:57 UTC
(In reply to comment #22)
...
> The non-solutions:
> MySQL/MariaDB/... : can be embedded, but GPL
>...
About MariaDB: I found this link http://kb.askmonty.org/en/licensing-faq/ and above all this link : http://monty-says.blogspot.fr/2010/12/in-search-of-bsdlgplapache-licensed.html
I don't know what to think about it (could we be in the "FLOSS exception" ?) since I'm quite lost by reading all that.
But perhaps you had already read all this.
Comment 24 Lionel Elie Mamane 2012-06-17 03:30:12 UTC
(In reply to comment #23)
> (In reply to comment #22)

>> The non-solutions:
>> MySQL/MariaDB/... : can be embedded, but GPL

> About MariaDB: I found this link http://kb.askmonty.org/en/licensing-faq/ and
> above all this link :
> http://monty-says.blogspot.fr/2010/12/in-search-of-bsdlgplapache-licensed.html
> I don't know what to think about it (could we be in the "FLOSS exception" ?)

The FLOSS exception is for the client library; in the current context, the discussion was about embedding the server.
Comment 25 Michael Meeks 2012-07-06 03:18:34 UTC
Lionel - should we open a new & cleaner bug with title:

"create FireBird SQL connector"

or something ? :-) and make that an easy hack ? this one is rather a long & daunting read for an easy-hacker.

I guess we could close this bug too if sqlite is not the right solution :-)
Comment 26 Lionel Elie Mamane 2012-07-06 04:11:02 UTC
Closing since we decided that SQLite is not the right/best solution.
Comment 27 Robinson Tryon (qubit) 2015-12-15 23:25:08 UTC
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyInteresting SkillCpp SkillSQL)
[NinjaEdit]