Bug Hunting Session
Bug 105711 - FIREBIRD: Couldn't create a varchar-field with less than 20 characters in GUI
Summary: FIREBIRD: Couldn't create a varchar-field with less than 20 characters in GUI
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
5.4.0.0.alpha0+
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 108082 (view as bug list)
Depends on:
Blocks: Database-Firebird-Default
  Show dependency treegraph
 
Reported: 2017-02-02 18:09 UTC by Robert Großkopf
Modified: 2018-11-07 17:11 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Open the database, open the table for editing, add a varchar-field with 5-string-length and try. (3.95 KB, application/vnd.oasis.opendocument.database)
2017-02-02 18:09 UTC, Robert Großkopf
Details
New testfile, created with 5.4.0.0.beta1 (3.46 KB, application/vnd.oasis.opendocument.database)
2017-05-26 16:44 UTC, Robert Großkopf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Großkopf 2017-02-02 18:09:26 UTC
Created attachment 130864 [details]
Open the database, open the table for editing, add a varchar-field with 5-string-length and try.

Open the attached database.
Open the table for changing the table.
Add a field and change the length to 5 characters.
Save the table.
Write something in the new field.
You could add text up to 20 characters without problem.
Reopen the table for editing.
Stringlength of the field has been changed to 20 characters.

Tested with
Version: 5.4.0.0.alpha0+
Build ID: 21f632d327a08fbe4c57b937745e5dee9006cfe2
CPU Threads: 4; OS Version: Linux 4.1; UI Render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-01-31_23:09:29
Locale: de-DE (de_DE.UTF-8); Calc: group

(Charset will be set correct in this version!)
Comment 1 Alex Thurgood 2017-02-06 08:48:29 UTC
Confirming with 

Version: 5.4.0.0.alpha0+
Build ID: 6c4310cc176c02624f5fa94fb50fec6050ce904b
CPU Threads: 2; OS Version: Mac OS X 10.12.2; UI Render: default; 
Locale: fr-FR (fr_FR.UTF-8); Calc: group
Comment 2 Alex Thurgood 2017-02-06 08:50:04 UTC
You can actually add way more than 20 characters in table data entry mode, but the string gets truncated to 20 characters when you save the record.
Comment 3 Alex Thurgood 2017-05-26 06:47:18 UTC
*** Bug 108082 has been marked as a duplicate of this bug. ***
Comment 4 Robert Großkopf 2017-05-26 16:28:22 UTC
I couldn't confirm this any more with
Version: 5.4.0.0.beta1
Build ID: 8672113ead4e403c55e31b1d9a3d1e0f3b299577
CPU threads: 4; OS: Linux 4.1; UI render: default; VCL: kde4; 
Locale: de-DE (de_DE.UTF-8); Calc: group

Could be the duplicate bug 108082 is a special bug of the used version or of Windows with this version?
Comment 5 Robert Großkopf 2017-05-26 16:36:01 UTC
.. but with the attached database it shows the length of 5 string and there could be added 20 characters to the field.

With another firebird-3-database I couldn't reproduce this. Length there is 5 and max input is also 5.
Comment 6 Robert Großkopf 2017-05-26 16:44:11 UTC
Created attachment 133625 [details]
New testfile, created with 5.4.0.0.beta1

Couldn't confirm the bug with the new testfile, but could input 20 characters in old attachement for 5 characters.
Comment 7 Tamas Bunth 2017-05-27 11:53:44 UTC
The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).
see: https://bugs.documentfoundation.org/show_bug.cgi?id=105142

A character now occupies up to 4 bytes, so when the 'length' option in the GUI is set to 5 it will allocate 20 bytes.

You can put more than 5 characters, because most of the characters don't need all the 4 bytes. Eg. you can put 20 characters if you use only the first 128 characters of ASCII.

SO what is the correct behavior of the 'length' GUI option set to 5?

- The user can put 20 bytes to the field. (it is the current version)

- The user can put only 5 characters even if there is more space allocated.

I think the correct one would be the second: the user can put only 5 characters (because the first version is really annoying when using the fixed char column, it is more straightforward and the functionality is the same as with HSQLDB).
Comment 8 Robert Großkopf 2017-05-27 17:44:56 UTC
(In reply to Tamas Bunth from comment #7)
> 
> I think the correct one would be the second: the user can put only 5
> characters (because the first version is really annoying when using the
> fixed char column, it is more straightforward and the functionality is the
> same as with HSQLDB).

You are right. And this will work right in the second attachment, created with 5.4.0.0beta1. Download an try to put
'123456'
into the field "Text_5_String".
An error appears. Won't save this input, Will only save '12345'.

But this won't work with the first attachment, created with 5.4.0.0alpha0. Download and try to put
'12345678901234567890'
into the field "ShortName".
You could save this data, but the field is set to a length of 5 characters, as it is set in the second attachment for "Text_5_String".
You could create new tables in this database with the same behavior: 5 characters, but 20 could be saved.

With new created databases it seems this bug won't appear - don't know why ...
Comment 9 Lionel Elie Mamane 2017-05-28 15:58:59 UTC
(In reply to Tamas Bunth from comment #7)
> The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).

> A character now occupies up to 4 bytes, so when the 'length' option in the
> GUI is set to 5 it will allocate 20 bytes.

> You can put more than 5 characters, because most of the characters don't
> need all the 4 bytes.

> SO what is the correct behavior of the 'length' GUI option set to 5?

The expected behaviour is 5 characters ("codepoint" in Unicode terminology), independently of how many bytes that is.

I understand Firebird deviates from the SQL standard in that respect, and takes a length in bytes. We could try to implement the "characters" thing within the Firebird SDBC driver (as your patch on gerrit tries to do, but I wouldn't bother. If we try to do that, there will always be cases that we won't catch. We will correctly implement that when the field is updated through the SDBC API, but then the user will do:

SELECT LENGTH(field) as len FROM table;
or
UPDATE TABLE SET field='+' || field || '+';

and get unexpected results (results inconsistent with the GUI).


I'd just let it be, and let the Firebird "misfeature" bubble up. A length of 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands that as 5 bytes, then so be it.


Maybe a middle ground would be to use UTF16 instead of UTF8 and let the length be UTF16 code units (that is length 5 is 10 bytes), if that's possible with Firebird. This is still a bug, but confusing character and UTF16 code units is so common that it usually not perceived as a bug. In my opinion, if this gives as a result that LENGTH(field) is in bytes and not in UTF16 code units, it is inconsistent and not desirable.
Comment 10 Tamas Bunth 2017-05-28 18:09:44 UTC
(In reply to Lionel Elie Mamane from comment #9)
> (In reply to Tamas Bunth from comment #7)
> > The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).
> 
> > A character now occupies up to 4 bytes, so when the 'length' option in the
> > GUI is set to 5 it will allocate 20 bytes.
> 
> > You can put more than 5 characters, because most of the characters don't
> > need all the 4 bytes.
> 
> > SO what is the correct behavior of the 'length' GUI option set to 5?
> 
> The expected behaviour is 5 characters ("codepoint" in Unicode terminology),
> independently of how many bytes that is.
> 
> I understand Firebird deviates from the SQL standard in that respect, and
> takes a length in bytes. We could try to implement the "characters" thing
> within the Firebird SDBC driver (as your patch on gerrit tries to do, but I
> wouldn't bother. If we try to do that, there will always be cases that we
> won't catch. We will correctly implement that when the field is updated
> through the SDBC API, but then the user will do:
> 
> SELECT LENGTH(field) as len FROM table;
> or
> UPDATE TABLE SET field='+' || field || '+';
> 
> and get unexpected results (results inconsistent with the GUI).

Hmm.. Right.

> I'd just let it be, and let the Firebird "misfeature" bubble up. A length of
> 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands
> that as 5 bytes, then so be it.

A length of 5 in the GUI is a VARCHAR(5) in Firebird, which allocates 20 bytes (as long as the character set is UTF-8).

The problem is rather the following:
Firebird truncates the input string at 20 bytes no matter how many characters it contains.

I think it's a bug in Firebird (but I couldn't find it in the bug tracker).

> Maybe a middle ground would be to use UTF16 instead of UTF8 and let the
> length be UTF16 code units (that is length 5 is 10 bytes), if that's
> possible with Firebird. This is still a bug, but confusing character and
> UTF16 code units is so common that it usually not perceived as a bug. In my
> opinion, if this gives as a result that LENGTH(field) is in bytes and not in
> UTF16 code units, it is inconsistent and not desirable.

There are two kind of length functions (actually there are three of them):
char_length and octet_length[1]

and they are working as expected.

[1] https://firebirdsql.org/refdocs/langrefupd20-char-length.html
Comment 11 Lionel Elie Mamane 2017-05-28 20:25:41 UTC
(In reply to Tamas Bunth from comment #10)
> (In reply to Lionel Elie Mamane from comment #9)

>> I'd just let it be, and let the Firebird "misfeature" bubble up. A length of
>> 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands
>> that as 5 bytes, then so be it.

> A length of 5 in the GUI is a VARCHAR(5) in Firebird, which allocates 20
> bytes (as long as the character set is UTF-8).

> The problem is rather the following:
> Firebird truncates the input string at 20 bytes no matter how many
> characters it contains.

> I think it's a bug in Firebird (but I couldn't find it in the bug tracker).

> There are two kind of length functions (actually there are three of them):
> char_length and octet_length[1]

> and they are working as expected.

So Firebird already has a working notion & implementation of "character". So definitely, yes, I'd say that the right place to fix that is Firebird. Just file it as a bug.
Comment 12 QA Administrators 2018-11-03 03:49:58 UTC Comment hidden (obsolete)
Comment 13 Robert Großkopf 2018-11-05 20:00:09 UTC
Have tested it again with LO 6.1.3.2, 64bit rpm Linux. I could put 20 characters in a field if I use  the first 128 characters of ASCII ('12345678901234567890'). With special German characters it is set to 10 characters. Have tested this with two databases: a migrated database and an external database, linked to a Firebird file. 

Have downloaded the second example. There must be something different. I could add only 5 characters. If I add more than 5 charcters the message "Value too long" appears. With this example it works right, with databases created in LO 6.1.3.2 it works wrong: more than 5 characters, no message when writing down more than 20 characters - only the length will be cut.