Bug 123150 - FIREBIRD: Query with 'LIKE' fails with error "malformed string", when index is set for the field of the condition of the query
Summary: FIREBIRD: Query with 'LIKE' fails with error "malformed string", when index i...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
6.1.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.4.0 target:6.3.0.2 target:6.2.6
Keywords:
Depends on:
Blocks: Database-Firebird-Default
  Show dependency treegraph
 
Reported: 2019-02-03 21:05 UTC by Gerhard Schaber
Modified: 2019-07-17 10:28 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Test file (280.57 KB, application/vnd.sun.xml.base)
2019-02-03 21:05 UTC, Gerhard Schaber
Details
Minimal test file (2.95 KB, application/vnd.sun.xml.base)
2019-03-28 08:05 UTC, Gerhard Schaber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gerhard Schaber 2019-02-03 21:05:07 UTC
Created attachment 148877 [details]
Test file

A simple SQL select fails depending on the search string. I am using the following query with the attached testquery.odb file:
select * from "MitgliederVerband" where "Vorname" LIKE 'Ötkül%'

This is the error I get:
1: firebird_sdbc error:
*Malformed string
caused by
'SELECT * FROM "MitgliederVerband" WHERE "Vorname" LIKE 'Ötkül%''


The same happens when I run the query via Basic macro.

Note that I have applied collation settings to the DB, so that I can run case insensitive queries. That query has been working for quite some time. The name Ötkül is the first name I came across that it does not seem to handle very well. I can store that name in the DB, but I cannot search for it, not even if the DB is empty.
Comment 1 Gerhard Schaber 2019-02-03 21:16:27 UTC
The SDB table was originally created this way:

CREATE TABLE "MitgliederVerband"("ID" INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH 0) NOT NULL PRIMARY KEY,"Vorname" VARCHAR(250) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,"Nachname" VARCHAR(250) CHARACTER SET UTF8 NOT NULL COLLATE UNICODE_CI_AI,"Gebdat" DATE,"Strasse" VARCHAR(250) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"Plz" INTEGER,"Ort" VARCHAR(250) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"Telefon" VARCHAR(50) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"Mail" VARCHAR(150) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"Tarifgruppe" INTEGER,"Eintritt" DATE,"Austritt" DATE,"Verein" INTEGER,"MitgliedsID" VARCHAR(10) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"LandID" INTEGER,"Grad" INTEGER,"Geschlecht" SMALLINT,"Titel" VARCHAR(50) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"Nation" INTEGER,"Geburtsort" VARCHAR(50) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"Beruf" VARCHAR(50) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"PrfgDatum" DATE,"DANReg" VARCHAR(100) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,"PrfgDaten" VARCHAR(250) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI)

When I create a new DB file, and then run the statement above, the SQL query works. I do not want to move everything to a new DB. The original is very complex and it is time consuming to move the DB and macros, etc. At least I would like to know what is wrong with my current one, so that I do not run into the same issue again.
Comment 2 Gerhard Schaber 2019-02-04 06:44:15 UTC
Tested with 6.2.1.0 pre-release and 6.1.4.
Comment 3 Gerhard Schaber 2019-02-04 06:50:30 UTC
It must be something related to the Ö at the beginning of the search string. It also fails with search string 'Ö%':
SELECT * FROM "MitgliederVerband" WHERE "Vorname" LIKE 'Ö%'
Comment 4 Gerhard Schaber 2019-02-04 08:57:33 UTC
It only seems to happen, if there is an index on that column:
CREATE INDEX "mverbvorname" ON "MitgliederVerband"("Vorname")

In that case I can reproduce it with a blank new DB. Just create the table with the statement above, then the index, then run the query.
Comment 5 Gerhard Schaber 2019-02-04 11:10:37 UTC
SELECT * FROM "MitgliederVerband" WHERE "Vorname" LIKE 'Ö%' COLLATE UNICODE_CI_AI
does not work, either, because the default character set seems to be NONE.
Comment 6 Gerhard Schaber 2019-02-04 11:31:23 UTC
Well, a workaround would be to use UPPER (only works due to the case insensitive collation):
SELECT * FROM "MitgliederVerband" WHERE "Vorname" LIKE UPPER('Ö%')
Comment 7 Robert Großkopf 2019-02-04 15:11:55 UTC
Tested with LO 6.1.4.2. on OpenSUSE 15, 64bit rpm Linux. Could confirm the buggy behaviour.
Note: 
1. The database is an internal Firebird. So I changed the title.
2. Seems to be a GUI-Bug. When sending the query through direct SQL it will work.
Comment 8 Gerhard Schaber 2019-02-04 16:00:33 UTC
This happens when manually executing the SQL query, as well as running it as Basic macro.
Comment 9 Robert Großkopf 2019-02-04 18:45:59 UTC
(In reply to Gerhard Schaber from comment #8)
> This happens when manually executing the SQL query, as well as running it as
> Basic macro.

You are right. I have switches from GUI to SQL and back and tested the query
SELECT * FROM "MitgliederVerband" WHERE "Vorname" LIKE 'Ötkül*'
- * instead of % - and this gives different results.

Then I deleted the index for the field "Vorname" and the original query workes - as you described.

When creating a new index the bug appears again.
Comment 10 Alex Thurgood 2019-02-05 07:53:47 UTC
@Tamas : thought you might be interested in this one - it appears that our parser mechanism doesn't handle UTF strings in LIKE comparisons either when the column is indexed.
Comment 11 Gerhard Schaber 2019-02-05 11:51:29 UTC
BTW, I just realized that using UPPER is no workaround. It makes an O out of the Ö.
Comment 12 Gerhard Schaber 2019-02-05 11:53:11 UTC
Nah, sorry, that is because of the UNICODE_CI_AI.
Comment 13 Gerhard Schaber 2019-02-12 08:19:56 UTC
Is there at least a workaround for that (that does not require UPPER everywhere where LIKE is used)?
Comment 14 Gerhard Schaber 2019-03-28 08:05:14 UTC
Created attachment 150341 [details]
Minimal test file

Here a simpler example (likeumlaut.odb). That is a most basic table with no custom collation or anything. Just one record with an Ö. Then run the SQL command manually via Tools > SQL, with "Show output of select statements" enabled:
select * from "table1" where "LastName" LIKE 'Ö%'

Something that basic should not fail.
Comment 15 Lionel Elie Mamane 2019-03-28 08:45:46 UTC
This smells like a Firebird bug. Do you get the same situation when using Firebird directly, not through LibreOffice?
Comment 16 Gerhard Schaber 2019-03-28 11:20:53 UTC
You might be right. I tried with FB 3.0.4.

Fails with malformed string:
select * from "table1" where "LastName" LIKE 'Ö%';

Works:
select * from "table1" where "LastName" LIKE UPPER(_iso8859_1 'Ö%');
Comment 17 Gerhard Schaber 2019-03-28 11:25:49 UTC
This one seems to fix the issue when executing the SQL statement in LO:
select * from "table1" where "LastName" LIKE _utf8 'Ö%';

Apparently one has to specify the character set. That was definitely not necessary with HSQLDB.
Comment 18 Gerhard Schaber 2019-03-28 14:21:01 UTC
Also, it is not possible to filter for a name starting with 'Ö' in a table using LIKE.
Comment 19 Gerhard Schaber 2019-04-03 10:51:14 UTC
I guess, even in the case this is really an issue in Firebird and there is no fix on that side (I cannot assess this from user perspective), LO itself should have a fix to make it work for the end user.
Comment 20 Gerhard Schaber 2019-07-03 07:35:48 UTC
If that is not fixed in Firebird, can LibreOffice do the conversion of the literal for a LIKE clause to the configured character set?
Comment 21 Tamas Bunth 2019-07-13 15:18:36 UTC
We need to specify the character set of the connection between the server and the sdbc driver too.

This can be done using the isc_dpb_lc_ctype DPB parameter option as documented in the Interbase API reference[1].

I have a patch on gerrit for that:
https://gerrit.libreoffice.org/#/c/75557/

The issue is present only when an index is specified to that column. It seems without specifying the isc_dpb_lc_ctype parameter the use of non-ascii characters in a literal is unspecified behavior.

[1] https://firebirdsql.org/en/reference-manuals/
Comment 22 Commit Notification 2019-07-13 19:22:23 UTC
Tamas Bunth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/c19c206cf42ac178906a855ae3cd198e0fcf8d14%5E%21

tdf#123150: Firebird: set UTF-8 client connection

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Tamas Bunth 2019-07-13 19:23:53 UTC
Gerhard, could you please verify, that the issue is fixed now?
Comment 24 Gerhard Schaber 2019-07-13 19:28:09 UTC
Great finding. Thanks. I will give it a try.
Comment 25 Commit Notification 2019-07-15 10:17:23 UTC
Tamas Bunth committed a patch related to this issue.
It has been pushed to "libreoffice-6-3":

https://git.libreoffice.org/core/+/4897c78c4b0ce88b65d509066990bdf5ae444d9a%5E%21

tdf#123150: Firebird: set UTF-8 client connection

It will be available in 6.3.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 26 Gerhard Schaber 2019-07-15 11:02:31 UTC
Thanks for putting this into 6.3. There has not been a master build with the changes, so I have not been able to test it, yet.
Comment 27 Xisco Faulí 2019-07-15 11:18:29 UTC
(In reply to Gerhard Schaber from comment #16)
> You might be right. I tried with FB 3.0.4.
> 
> Fails with malformed string:
> select * from "table1" where "LastName" LIKE 'Ö%';
> 
> Works:
> select * from "table1" where "LastName" LIKE UPPER(_iso8859_1 'Ö%');

I do confirm this issue is fixed in

Version: 6.4.0.0.alpha0+
Build ID: ed2107f0488887528d8e49135d3270705f389ed5
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

when selecting 'Show output"
Comment 28 Gerhard Schaber 2019-07-17 06:02:59 UTC
Works now. Tested on Windows with manually executed statements and filters, both with a new and an existing database.

Version: 6.4.0.0.alpha0+ (x86)
Build ID: c738be4de6886a0c96b7d10df7e78c8b2964c135
CPU threads: 8; OS: Windows 10.0; UI render: default; VCL: win; 
Locale: de-AT (de_AT); UI-Language: en-US
Calc: threaded
Comment 29 Xisco Faulí 2019-07-17 07:50:06 UTC
Setting to VERIFIED based on comment 27 and comment 28
Comment 30 Commit Notification 2019-07-17 10:28:19 UTC
Tamas Bunth committed a patch related to this issue.
It has been pushed to "libreoffice-6-2":

https://git.libreoffice.org/core/+/3173e00f428ea8cf79f36efb38d15028aca01d4a%5E%21

tdf#123150: Firebird: set UTF-8 client connection

It will be available in 6.2.6.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.