Bug 131238 - SQLite/ODBC: LibreOffice fails Unicode compliance with multi-byte characters, such as emoji, for example 😂
Summary: SQLite/ODBC: LibreOffice fails Unicode compliance with multi-byte characters,...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
7.0.3.1 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:25.2.0
Keywords:
Depends on:
Blocks: Database-Connectivity
  Show dependency treegraph
 
Reported: 2020-03-09 13:58 UTC by Max Fritzler
Modified: 2024-07-27 12:11 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
detailed steps to reproduce (1.20 MB, application/pdf)
2020-03-09 14:11 UTC, Max Fritzler
Details
sample SQLITE Db containing a few records for the test code (32.00 KB, application/octet-stream)
2020-03-09 14:13 UTC, Max Fritzler
Details
Sample BASE db with code to demonstrate the bug. (55.90 KB, application/vnd.sun.xml.base)
2020-03-09 14:14 UTC, Max Fritzler
Details
SQL trace log from function that produces the error (327.07 KB, text/plain)
2020-03-09 14:15 UTC, Max Fritzler
Details
SQL trace log from direct update, which works correctly (511.51 KB, text/plain)
2020-03-09 14:16 UTC, Max Fritzler
Details
The correct TestEmoji for use with testing. Please discard the other one. (16.29 KB, application/vnd.sun.xml.base)
2020-03-09 22:40 UTC, Max Fritzler
Details
TestDB for embedded Firebird (7.39 KB, application/vnd.sun.xml.base)
2020-03-10 15:05 UTC, Max Fritzler
Details
test database in HSQL format (7.75 KB, application/vnd.sun.xml.base)
2020-03-10 15:07 UTC, Max Fritzler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Max Fritzler 2020-03-09 13:58:45 UTC
Description:
Note well.  Direct entry of an emoji into a table in LibreOffice via the Winkey period dialog, or by simply pasting it in, works. 
But, trying to enter the data via the BASIC SQL Statement command fails.  Example code below.
Similarly, reading a record from a SQLite table that contains an emoji into a recordset, and then writing from the recordset to another table fails.
Using an SQL substring statement to copy a portion of the record from one table to another works.  
I therefore there is something in the LibreOffice functions to write to a database that is not Unicode compliant.


Steps to Reproduce:
1.See attached document with sample code and screen shots and results and tracelogs
2.
3.

Actual Results:
It maps the emoji to either question marks, or to multiple characters with incorrect shapes.  It appears to be interpreting a multi-byte emoji character as two single-byte characters

Expected Results:
Emoji entered into the database


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Version: 6.3.4.2 (x64)
Build ID: 60da17e045e08f1793c57c00ba83cdfce946d0aa
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 1 Max Fritzler 2020-03-09 14:11:43 UTC
Created attachment 158510 [details]
detailed steps to reproduce

Several pages of screenshots showing how to reproduce the problem, includes sample test code listing.
Comment 2 Max Fritzler 2020-03-09 14:13:16 UTC
Created attachment 158511 [details]
sample SQLITE Db containing a few records for the test code
Comment 3 Max Fritzler 2020-03-09 14:14:54 UTC
Created attachment 158512 [details]
Sample BASE db with code to demonstrate the bug.
Comment 4 Max Fritzler 2020-03-09 14:15:44 UTC
Created attachment 158513 [details]
SQL trace log from function that produces the error
Comment 5 Max Fritzler 2020-03-09 14:16:22 UTC
Created attachment 158514 [details]
SQL trace log from direct update, which works correctly
Comment 6 Max Fritzler 2020-03-09 22:40:26 UTC
Created attachment 158528 [details]
The correct TestEmoji for use with testing.  Please discard the other one.
Comment 7 Max Fritzler 2020-03-09 22:42:45 UTC
Comment on attachment 158512 [details]
Sample BASE db with code to demonstrate the bug.

Sorry, uploaded the wrong version.  I've uploaded a correct version in another attahment with the irrelevent junk stripped out of this one.
Comment 8 Robert Großkopf 2020-03-10 06:40:18 UTC
I have tested only this:
Pasted the emoji to an internal HSQLDB.
No problem occured.
Could also refresh the table - no problem.

There are so many things reported here, which could be a part of the bug.

Could you please test your BASIC macro with the internal HSQLDB?
Comment 9 Max Fritzler 2020-03-10 15:05:39 UTC
Created attachment 158559 [details]
TestDB for embedded Firebird

I was requested to check the test macro in HSQL format.  While i was at it, I tested it in embedded Firebird format.  This is the test db in embedded Firebird format.  The macro works as expected there, which shows the problem is apparently isolated to cases using the SQLite back end.
Comment 10 Max Fritzler 2020-03-10 15:07:04 UTC
Created attachment 158560 [details]
test database in HSQL format

I was requested to check the test macro in HSQL format.  This is the test db in embedded HSQL format.  While i was at it, I tested it in embedded Firebird format, and uploaed that in a separate attachment.  The macro works as expected with embedded HSQL or embedded Firebird, which shows the problem is apparently isolated to cases using the SQLite back end.
Comment 11 Max Fritzler 2020-03-10 15:10:31 UTC
I was requested to test the test macro in HSQL format.  I uploaded two test databases, one in HSQL and one in Firebird.  The macro works as expected in both those formats, but fails for SQLite.  Evidently, the processing of an SQL string as shown below is done differently based on the attached database.  Whis is not really a surprise.  Here's the statement that succeeds in embedded databases, but fails with SQLite
Dim oStatement As Object
oStatement = ThisComponent.CurrentController.ActiveConnection.createStatement()
      sql = "INSERT INTO TestEmoji (author, comment) VALUES ('test author', '❤❤SWIFTIE❤❤')"
result = oStatement.execute(sql)
Comment 12 Robert Großkopf 2020-03-10 16:02:37 UTC
While reading your description, pasting of data into a table works, but "When I did Data / Refresh in LibreOffice I saw the same thing: " (2 ?? instead of the emoji). So it seems it has nothing to do with the Basic-macro.

Then I tested the same with a MariaDB, connected with the direct driver, connected with ODBC and connected with JDBC to Base. All these drivers will give the same result. The data will be saved and could be seen well after refreshing the table.

I will have a look at SQLITE now.
Comment 13 Robert Großkopf 2020-03-10 16:28:43 UTC
Have tested the same now in SQLITE. 
Connected to a SQLITE-database trough ODBC.
Opened a table and added an emoji.
Saved the row.
Refreshed the row.
Emoji has been saved.

The same result I got when executing the Basic-Macro.

So I couldn't see any buggy behavior here.
Tested under OpenSUSE 15.1 64bit rpm Linux,
LO 6.4.2.1, also LO 6.3.5.2
SQLITE 3.28.0
sqliteodbc 0.9995
Comment 14 Max Fritzler 2020-03-10 18:23:29 UTC
What JDBC driver are you using to connect to BASE?  I'll give that a try.

I saw correct behavior using JDBC driver from  Taro L. Saito, availble here https://github.com/xerial/sqlite-jdbc.  However, this driver produced immediate cursor problems the moment I opened a table in the UI, a problem noticed by Ratslinger on ask libreoffice.org.  So I had to abandon the JDBC driver.
Comment 15 Robert Großkopf 2020-03-10 18:36:50 UTC
(In reply to Max Fritzler from comment #14)
> What JDBC driver are you using to connect to BASE?  I'll give that a try.
> 
> I saw correct behavior using JDBC driver from  Taro L. Saito, availble here
> https://github.com/xerial/sqlite-jdbc.  However, this driver produced
> immediate cursor problems the moment I opened a table in the UI, a problem
> noticed by Ratslinger on ask libreoffice.org.  So I had to abandon the JDBC
> driver.

I tested MariaDB with this connections. 

SQLITE-connection with JDBC doesn't work (SQLite only supports TYPE_FORWARD_ONLY Cursors).

Seems to me we are looking for a special Windows-bug here.

I will change the titel to set this to SQLite and ODBC
Comment 16 Max Fritzler 2020-03-12 16:46:56 UTC
I just confirmed this on a second Windows machine, with a fresh install of LibreOffice 6.4, the current released version.
Comment 17 Xisco Faulí 2020-11-18 15:22:08 UTC
Hello Max Fritzler,
Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest version.
Comment 18 Max Fritzler 2020-12-06 14:41:45 UTC
As requested by Xisco Faulí, I upgraded to 7.0.3.1 and reran the test program.  The bug is still present.  The output of the test is still as shown in the above comments.

Please note.  Pretty much anyone can run this test.  Just download the attachments and run the TestEmoji macro.  The macro tells you which table to look in for the output, and the rows in the table tell you what to expect.
Comment 19 Mike Kaganski 2024-07-22 11:50:01 UTC
I took a look at the sample you provided. The 'comments' database reports UTF-8 encoding, seen using this SQL directly:

 PRAGMA encoding;

And if the connection is configured to use UTF-8 - as you indicated in attachment 158510 [details] - then LibreOffice will use that encoding in its calls. It definitely happens, e.g., in OStatement_Base::execute [1].

The question is: is this correct? I fail to find a definite statement on what encoding the SQLCHAR arguments are expected to use. I found a random answer on StackOverflow [2], which seems to indicate that yes, the encoding should be "system", not the one from the database. But still, that's not an authoritative answer. If it's true, then indeed we have a bug. If not, then we can avoid excessive complexity.

By the way: it's an impressive job that you did here!

[1] https://opengrok.libreoffice.org/xref/core/connectivity/source/drivers/odbc/OStatement.cxx?r=4d97239b#311
[2] https://stackoverflow.com/a/6587447/1397376
Comment 20 Commit Notification 2024-07-27 12:11:36 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5ed1415d1cf03a3d671ebd11582dfaa90f1168bd

tdf#68676, tdf#131238: implement and use Unicode ODBC functions on Windows

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.