Bug 123055 - CSV database encoded with UTF-8 BOM: First header line value keeps quotes
Summary: CSV database encoded with UTF-8 BOM: First header line value keeps quotes
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
6.1.4.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.3.0
Keywords:
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2019-01-30 12:02 UTC by Dominik Hölzl
Modified: 2021-09-10 17:53 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
CSV Datasource file and screenshot (12.86 KB, application/x-zip-compressed)
2019-01-30 12:02 UTC, Dominik Hölzl
Details
file encoded utf8-bom (24.75 KB, image/png)
2019-01-30 17:42 UTC, Oliver Brinzing
Details
insert_fields_db_encoded_utf-8 (41.90 KB, image/png)
2019-01-30 17:44 UTC, Oliver Brinzing
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dominik Hölzl 2019-01-30 12:02:24 UTC
Created attachment 148755 [details]
CSV Datasource file and screenshot

Steps to reproduce:

* Create new OpenDocument Text document
* Open the document with LibreOffice Writer
* View data sources (View -> Data Sources)
* Clean up data sources as necessary
  (Context Menu -> Registered databases ... -> Delete -> Yes -> OK)
* Attach data source
  * Open Fields dialog (Insert -> Field -> More Fields ...)
  * Select Mail merge fields (Database -> Mail merge fields)
  * Add database (Add database file -> Browse...)
  * Select the attached CSV file (Database.csv)
  * Adjust row format:
    Field separator: ;
    Text separator: "
    Decimal separator: ,
    Thousands separator: .
    Ensure "Text contains headers" is selected
    Character set: Unicode (UTF-8)
  * Click OK
  * Expand the added database "Datasource"
  * Expand the table "Datasource"
 
The first field, EMail, is shown as "EMail" (with quotes) whereas FieldNameAndAddress, Surname and all other following fields correctly do not contain the quotes.
The "bad" field "EMail" seems to be working when using as mail merge field, but is now incompatible with documents created with an older version of LibreOffice (I have not checked since which version the break occurs), as the quotes were not present before.

The problem does not occur if all quotes in the header line in the CSV file are removed.
Comment 1 Oliver Brinzing 2019-01-30 17:42:07 UTC
Created attachment 148768 [details]
file encoded utf8-bom

i noticed attached file is encoded with utf8-bom.

could you please check, if it works if you use utf-8 instead?
Comment 2 Oliver Brinzing 2019-01-30 17:44:27 UTC
Created attachment 148770 [details]
insert_fields_db_encoded_utf-8

for me, it seems to work with utf8 and

Version: 6.1.5.1 (x64)
Build ID: f18954c1ba9116b85c32b6bdbc0188d3e0fd24c7
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: de-DE (de_DE); Calc: group threaded
Comment 3 Dominik Hölzl 2019-01-31 08:35:26 UTC
Hello!

Thank you for the quick response.

Removing the BOM in the datasource file fixes the problem.
But shouldn't it also work with BOM?

Regards,
Dominik
Comment 4 Oliver Brinzing 2019-01-31 17:29:34 UTC
(In reply to Dominik Hölzl from comment #3)

> Removing the BOM in the datasource file fixes the problem.
> But shouldn't it also work with BOM?

Menu File/New/Database Wizard has no option to select encoding.
and changing encoding later via Tables/CSVDataBase/context menu/
Database/Properties... from "System" to "Unicode (UTF-8)" does not work
(EMail, is shown as "EMail" with quotes)

according to:
Bug 115056 - FILESAVE: Calc doesn't write CSV as UTF-8
Bug  82254 - FILESAVE: UTF-8 BOM removed from CSV when saving file

BOM get lost during save but opening your BOM *.csv with calc
is fine with selected enconding "Unicode (UTF-8)".

IMHO this can be seen as a bug.
Comment 5 QA Administrators 2021-05-05 03:45:47 UTC Comment hidden (obsolete, spam)
Comment 6 Commit Notification 2021-09-10 14:23:39 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aa69f2a176329795fad957ac639329307c146e58

tdf#123055 - Start to read unicode text in order to avoid the BOM

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.