When migrating an existing database document with HSQLDB to Firebird language-specific characters in field names are not transferred correctly. For example, german "ß" is changed to "/u00df", its unicode equivalent, in the _name_ of the field. his is not a problem of Firebird, and not even of Firebird within LibreOffice, you can create fields with names using language-specific characters, they are saved correctly and are reproduced when opening the document; I've tested it. So that's a deficiency in the migration program. I will add test documents: a database document with one table, having (except for the first, which is simply an id) field names consisting of one generally used latin character and one language-specific: "?" and the three german Umlaute and three french (and partly Italian ...) characters, "a" with the three accents. And the result after the migration to Firebird. These are examples of language-specific characters easily available on my keyboard. For all those the unicode representation is used for the name in Firebird, which certainly is a problem - if not a catastroph - for all programs using that table, because all references to these fields do not work. This is a grave problem, because for non-european languages there are lots of those non-latin characters, which must be migrated correctly.
Created attachment 141495 [details] test doc with HSQLDB
Created attachment 141496 [details] test doc after migration to Firebird
Could confirm this buggy behavior: Fieldnames of a table aren't migrated correctly if special charcaters have been used. Tested with Version: 6.1.0.0.alpha0+ Build-ID: cc10b063235dcb25ad16f697ea0b1ff91a10bacb CPU-Threads: 4; BS: Linux 4.4; UI-Render: Standard; VCL: kde4; TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-04-18_13:21:28
Once more just to note - this functions as expected with the existing code used for a simple drag drop of the table from hsql->fb odb. Is there some reason this is all be reimplemented apparently from scratch?
Tamas Bunth committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=ded4dcbbce875efeffba7e894a6dea1f584e8e9b tdf#117115 dbahsql: respect unicode in columns It will be available in 6.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Commit Notification from comment #5) > Tamas Bunth committed a patch related to this issue. > It has been pushed to "master": > > http://cgit.freedesktop.org/libreoffice/core/commit/ > ?id=ded4dcbbce875efeffba7e894a6dea1f584e8e9b > > tdf#117115 dbahsql: respect unicode in columns For one, that only fixes names of columns, not names of tables. For another, a name "f\u2345bar" is erroneously converted to "f⍅bar".
...and a name using non-BMP chars like "💩" (U+1F4A9 PILE OF POO) is converted to something like "?" (the "in-transit" representation appears to be using an encoding of individual UTF-16 code units, "\ud83d\udca9", and the newly added lcl_ConvertToUTF8 tries to convert them back to UTF-8 individually with const OString sNewChar = OString(&cDec, 1, RTL_TEXTENCODING_UTF8); which doesn't work).
(In reply to Stephan Bergmann from comment #6) > For another, a name "f\u2345bar" is erroneously converted to "f⍅bar". I've found this: http://graphemica.com/%E2%8D%85 According to this web page, \u2345 is "leftwards vane", so it seems to me the conversion is right. What would be the expected result?
(In reply to Tamas Bunth from comment #8) > (In reply to Stephan Bergmann from comment #6) > > For another, a name "f\u2345bar" is erroneously converted to "f⍅bar". > > I've found this: > http://graphemica.com/%E2%8D%85 > > According to this web page, \u2345 is "leftwards vane", so it seems to me > the conversion is right. What would be the expected result? An entity that was named "f\u2345bar" in the original database should remain named like that in the converted database too, I'd assume. (It is apparently encoded as something like "f\u005Cu2345bar" when it reaches lcl_ConvertToUTF8, and then erroneously converted to "f⍅bar" there.)
ded4dcbbce875efeffba7e894a6dea1f584e8e9b is in master ( 6-2 )
Tamas Bunth committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=647a9fec404ebce898a44de63fcf1b1d6f5036e6 tdf#117115 dbahsql: respect escaped '\' It will be available in 6.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
checked on Ubuntu 18.04 with build: Version: 6.2.0.0.alpha0+ Build ID: aae64e0f9cd1582c0dc31992aa22b849d2527c80 CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk2; TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-06-23_02:31:34 Locale: en-US (en_US.UTF-8); Calc: group threaded Works as expected.
Verified on Windows 10: Version: 6.2.0.0.alpha0+ (x64) Build-ID: d8733e2c59f120acf9feddff04964becc3358621 CPU-Threads: 4; BS: Windows 10.0; UI-Render: GL; TinderBox: Win-x86_64@62-TDF, Branch:master, Time: 2018-06-26_11:09:03 Gebietsschema: de-DE (de_DE); Calc: CL
(In reply to Stephan Bergmann from comment #6) > (In reply to Commit Notification from comment #5) > > Tamas Bunth committed a patch related to this issue. > > It has been pushed to "master": > > > > http://cgit.freedesktop.org/libreoffice/core/commit/ > > ?id=ded4dcbbce875efeffba7e894a6dea1f584e8e9b > > > > tdf#117115 dbahsql: respect unicode in columns > > For one, that only fixes names of columns, not names of tables. I assume the issue with table names has not yet been addressed (cf. bug 121469)?
(In reply to Stephan Bergmann from comment #7) > ...and a name using non-BMP chars like "💩" (U+1F4A9 PILE OF POO) is > converted to something like "?" (the "in-transit" representation appears to > be using an encoding of individual UTF-16 code units, "\ud83d\udca9", and > the newly added lcl_ConvertToUTF8 tries to convert them back to UTF-8 > individually with > > const OString sNewChar = OString(&cDec, 1, RTL_TEXTENCODING_UTF8); > > which doesn't work). addressed now with <https://gerrit.libreoffice.org/67245> "Fix conversion of non-BMP chars" (In reply to Stephan Bergmann from comment #14) > I assume the issue with table names has not yet been addressed (cf. bug > 121469)? apparently addressed with issue 121469 comment 11