Bug 105711 - FIREBIRD: Couldn't create a varchar-field with less than 20 characters in GUI
Summary: FIREBIRD: Couldn't create a varchar-field with less than 20 characters in GUI
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
5.4.0.0.alpha0+
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Database-Firebird-Default
  Show dependency treegraph
 
Reported: 2017-02-02 18:09 UTC by Robert Großkopf
Modified: 2019-11-11 15:05 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Open the database, open the table for editing, add a varchar-field with 5-string-length and try. (3.95 KB, application/vnd.oasis.opendocument.database)
2017-02-02 18:09 UTC, Robert Großkopf
Details
New testfile, created with 5.4.0.0.beta1 (3.46 KB, application/vnd.oasis.opendocument.database)
2017-05-26 16:44 UTC, Robert Großkopf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Großkopf 2017-02-02 18:09:26 UTC
Created attachment 130864 [details]
Open the database, open the table for editing, add a varchar-field with 5-string-length and try.

Open the attached database.
Open the table for changing the table.
Add a field and change the length to 5 characters.
Save the table.
Write something in the new field.
You could add text up to 20 characters without problem.
Reopen the table for editing.
Stringlength of the field has been changed to 20 characters.

Tested with
Version: 5.4.0.0.alpha0+
Build ID: 21f632d327a08fbe4c57b937745e5dee9006cfe2
CPU Threads: 4; OS Version: Linux 4.1; UI Render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-01-31_23:09:29
Locale: de-DE (de_DE.UTF-8); Calc: group

(Charset will be set correct in this version!)
Comment 1 Alex Thurgood 2017-02-06 08:48:29 UTC
Confirming with 

Version: 5.4.0.0.alpha0+
Build ID: 6c4310cc176c02624f5fa94fb50fec6050ce904b
CPU Threads: 2; OS Version: Mac OS X 10.12.2; UI Render: default; 
Locale: fr-FR (fr_FR.UTF-8); Calc: group
Comment 2 Alex Thurgood 2017-02-06 08:50:04 UTC
You can actually add way more than 20 characters in table data entry mode, but the string gets truncated to 20 characters when you save the record.
Comment 3 Alex Thurgood 2017-05-26 06:47:18 UTC
*** Bug 108082 has been marked as a duplicate of this bug. ***
Comment 4 Robert Großkopf 2017-05-26 16:28:22 UTC
I couldn't confirm this any more with
Version: 5.4.0.0.beta1
Build ID: 8672113ead4e403c55e31b1d9a3d1e0f3b299577
CPU threads: 4; OS: Linux 4.1; UI render: default; VCL: kde4; 
Locale: de-DE (de_DE.UTF-8); Calc: group

Could be the duplicate bug 108082 is a special bug of the used version or of Windows with this version?
Comment 5 Robert Großkopf 2017-05-26 16:36:01 UTC
.. but with the attached database it shows the length of 5 string and there could be added 20 characters to the field.

With another firebird-3-database I couldn't reproduce this. Length there is 5 and max input is also 5.
Comment 6 Robert Großkopf 2017-05-26 16:44:11 UTC
Created attachment 133625 [details]
New testfile, created with 5.4.0.0.beta1

Couldn't confirm the bug with the new testfile, but could input 20 characters in old attachement for 5 characters.
Comment 7 Tamas Bunth 2017-05-27 11:53:44 UTC
The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).
see: https://bugs.documentfoundation.org/show_bug.cgi?id=105142

A character now occupies up to 4 bytes, so when the 'length' option in the GUI is set to 5 it will allocate 20 bytes.

You can put more than 5 characters, because most of the characters don't need all the 4 bytes. Eg. you can put 20 characters if you use only the first 128 characters of ASCII.

SO what is the correct behavior of the 'length' GUI option set to 5?

- The user can put 20 bytes to the field. (it is the current version)

- The user can put only 5 characters even if there is more space allocated.

I think the correct one would be the second: the user can put only 5 characters (because the first version is really annoying when using the fixed char column, it is more straightforward and the functionality is the same as with HSQLDB).
Comment 8 Robert Großkopf 2017-05-27 17:44:56 UTC
(In reply to Tamas Bunth from comment #7)
> 
> I think the correct one would be the second: the user can put only 5
> characters (because the first version is really annoying when using the
> fixed char column, it is more straightforward and the functionality is the
> same as with HSQLDB).

You are right. And this will work right in the second attachment, created with 5.4.0.0beta1. Download an try to put
'123456'
into the field "Text_5_String".
An error appears. Won't save this input, Will only save '12345'.

But this won't work with the first attachment, created with 5.4.0.0alpha0. Download and try to put
'12345678901234567890'
into the field "ShortName".
You could save this data, but the field is set to a length of 5 characters, as it is set in the second attachment for "Text_5_String".
You could create new tables in this database with the same behavior: 5 characters, but 20 could be saved.

With new created databases it seems this bug won't appear - don't know why ...
Comment 9 Lionel Elie Mamane 2017-05-28 15:58:59 UTC
(In reply to Tamas Bunth from comment #7)
> The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).

> A character now occupies up to 4 bytes, so when the 'length' option in the
> GUI is set to 5 it will allocate 20 bytes.

> You can put more than 5 characters, because most of the characters don't
> need all the 4 bytes.

> SO what is the correct behavior of the 'length' GUI option set to 5?

The expected behaviour is 5 characters ("codepoint" in Unicode terminology), independently of how many bytes that is.

I understand Firebird deviates from the SQL standard in that respect, and takes a length in bytes. We could try to implement the "characters" thing within the Firebird SDBC driver (as your patch on gerrit tries to do, but I wouldn't bother. If we try to do that, there will always be cases that we won't catch. We will correctly implement that when the field is updated through the SDBC API, but then the user will do:

SELECT LENGTH(field) as len FROM table;
or
UPDATE TABLE SET field='+' || field || '+';

and get unexpected results (results inconsistent with the GUI).


I'd just let it be, and let the Firebird "misfeature" bubble up. A length of 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands that as 5 bytes, then so be it.


Maybe a middle ground would be to use UTF16 instead of UTF8 and let the length be UTF16 code units (that is length 5 is 10 bytes), if that's possible with Firebird. This is still a bug, but confusing character and UTF16 code units is so common that it usually not perceived as a bug. In my opinion, if this gives as a result that LENGTH(field) is in bytes and not in UTF16 code units, it is inconsistent and not desirable.
Comment 10 Tamas Bunth 2017-05-28 18:09:44 UTC
(In reply to Lionel Elie Mamane from comment #9)
> (In reply to Tamas Bunth from comment #7)
> > The driver uses UTF-8 encoding for text-based columns (char, varchar, clob).
> 
> > A character now occupies up to 4 bytes, so when the 'length' option in the
> > GUI is set to 5 it will allocate 20 bytes.
> 
> > You can put more than 5 characters, because most of the characters don't
> > need all the 4 bytes.
> 
> > SO what is the correct behavior of the 'length' GUI option set to 5?
> 
> The expected behaviour is 5 characters ("codepoint" in Unicode terminology),
> independently of how many bytes that is.
> 
> I understand Firebird deviates from the SQL standard in that respect, and
> takes a length in bytes. We could try to implement the "characters" thing
> within the Firebird SDBC driver (as your patch on gerrit tries to do, but I
> wouldn't bother. If we try to do that, there will always be cases that we
> won't catch. We will correctly implement that when the field is updated
> through the SDBC API, but then the user will do:
> 
> SELECT LENGTH(field) as len FROM table;
> or
> UPDATE TABLE SET field='+' || field || '+';
> 
> and get unexpected results (results inconsistent with the GUI).

Hmm.. Right.

> I'd just let it be, and let the Firebird "misfeature" bubble up. A length of
> 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands
> that as 5 bytes, then so be it.

A length of 5 in the GUI is a VARCHAR(5) in Firebird, which allocates 20 bytes (as long as the character set is UTF-8).

The problem is rather the following:
Firebird truncates the input string at 20 bytes no matter how many characters it contains.

I think it's a bug in Firebird (but I couldn't find it in the bug tracker).

> Maybe a middle ground would be to use UTF16 instead of UTF8 and let the
> length be UTF16 code units (that is length 5 is 10 bytes), if that's
> possible with Firebird. This is still a bug, but confusing character and
> UTF16 code units is so common that it usually not perceived as a bug. In my
> opinion, if this gives as a result that LENGTH(field) is in bytes and not in
> UTF16 code units, it is inconsistent and not desirable.

There are two kind of length functions (actually there are three of them):
char_length and octet_length[1]

and they are working as expected.

[1] https://firebirdsql.org/refdocs/langrefupd20-char-length.html
Comment 11 Lionel Elie Mamane 2017-05-28 20:25:41 UTC
(In reply to Tamas Bunth from comment #10)
> (In reply to Lionel Elie Mamane from comment #9)

>> I'd just let it be, and let the Firebird "misfeature" bubble up. A length of
>> 5 in the GUI should be a VARCHAR(5) in Firebird. If firebird understands
>> that as 5 bytes, then so be it.

> A length of 5 in the GUI is a VARCHAR(5) in Firebird, which allocates 20
> bytes (as long as the character set is UTF-8).

> The problem is rather the following:
> Firebird truncates the input string at 20 bytes no matter how many
> characters it contains.

> I think it's a bug in Firebird (but I couldn't find it in the bug tracker).

> There are two kind of length functions (actually there are three of them):
> char_length and octet_length[1]

> and they are working as expected.

So Firebird already has a working notion & implementation of "character". So definitely, yes, I'd say that the right place to fix that is Firebird. Just file it as a bug.
Comment 12 QA Administrators 2018-11-03 03:49:58 UTC Comment hidden (obsolete)
Comment 13 Robert Großkopf 2018-11-05 20:00:09 UTC
Have tested it again with LO 6.1.3.2, 64bit rpm Linux. I could put 20 characters in a field if I use  the first 128 characters of ASCII ('12345678901234567890'). With special German characters it is set to 10 characters. Have tested this with two databases: a migrated database and an external database, linked to a Firebird file. 

Have downloaded the second example. There must be something different. I could add only 5 characters. If I add more than 5 charcters the message "Value too long" appears. With this example it works right, with databases created in LO 6.1.3.2 it works wrong: more than 5 characters, no message when writing down more than 20 characters - only the length will be cut.
Comment 14 QA Administrators 2019-11-08 03:38:36 UTC Comment hidden (obsolete)
Comment 15 Robert Großkopf 2019-11-08 11:55:56 UTC
Have tested again. Could create a varchar-field for only one character with LO 6.3.3.2 on OpenSUSE 15 64bit rpm Linux.

I will set this one to WORKSFORME.
Comment 16 avsharapov 2019-11-08 12:35:26 UTC
Have tested. Windows 7 with LO 6.3.3.2.
Create new table with two fields.
test1 VARCHAR length 1
test2 VARCHAR length 5
In grid edit I can enter 4 characters into test1 field (for example 0123 or two Russian symbols or four English symbols)
and 20 characters into test2 field
Comment 17 Lionel Elie Mamane 2019-11-08 12:57:42 UTC
IIRC, this was done to handle the "worst case scenario" of a Unicode character needing 4 bytes in storage, and since in Firebird the length of a varchar field is bytes and not characters... to be sure to put 1, resp. 5, characters we must allocate a VARCHAR(4), respectively a VARCHAR(20).
Comment 18 Robert Großkopf 2019-11-08 14:53:47 UTC
(In reply to Lionel Elie Mamane from comment #17)
> IIRC, this was done to handle the "worst case scenario" of a Unicode
> character needing 4 bytes in storage, and since in Firebird the length of a
> varchar field is bytes and not characters... to be sure to put 1, resp. 5,
> characters we must allocate a VARCHAR(4), respectively a VARCHAR(20).

It's funny. I tested this with the second attachement. The database had been created with LO 5.4. You couldn't add more than '12345' to a 5-characters-field. A message appears, that there couldn't be added more than 5 characters. Try it.

After reading comment 16 I tested it with a database I created new. And soon the described behaviour appears. It isn't the same bug which appears first, but the behaviour of the database created with LO 5.4 is better than the behaviour of the database created now.

Has the internal database been changed since LO 5.4.0.0.beta1?
Comment 19 Alex Thurgood 2019-11-11 11:37:32 UTC
(In reply to Robert Großkopf from comment #18)

> Has the internal database been changed since LO 5.4.0.0.beta1?

The switch from FB2.5 to FB3 ?
Comment 20 Robert Großkopf 2019-11-11 15:05:11 UTC
(In reply to Alex Thurgood from comment #19)
> (In reply to Robert Großkopf from comment #18)
> 
> > Has the internal database been changed since LO 5.4.0.0.beta1?
> 
> The switch from FB2.5 to FB3 ?

No, this has been changed with the first version of LO 5.3. Could be there are updates of FB3 from LO 5.4 up to now.