Bug 55049 - FILEOPEN: CSV import - \" is not seen as an escaped quote, and hence interpreted as string delimiters
Summary: FILEOPEN: CSV import - \" is not seen as an escaped quote, and hence interpr...
Status: CLOSED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.4.7.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
: 118007 (view as bug list)
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2012-09-18 11:55 UTC by Sander Bol
Modified: 2022-10-02 15:21 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
An example file containing a reduced test case. (18 bytes, text/csv)
2012-09-18 11:55 UTC, Sander Bol
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sander Bol 2012-09-18 11:55:02 UTC
Created attachment 67324 [details]
An example file containing a reduced test case.

Problem description: 
Loading a CSV file that contains the field delimiter (ie: semicolon) within a text field (ie: double quotes) should ignore the field delimiter. It is placed within the text-delimiters to indicate it is part of the text string, and should therefor not be treated as a field delimiter.

We found this bug when trying to load a CSV-export of a database table containing data stored in PHP's serialized data format.

Steps to reproduce:
1. Create a CSV file with the following contents:
"test";"\";\"";EOF
2. Load the file into Calc.

Current behavior:
File imported as follows (with pipes used to illustrate column separators).
test|\|\""|EOF

Expected behavior:
File imported as follows (with pipes used to illustrate column separators).
test|";"|EOF

Platform (if different from the browser): 
              
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1
Comment 1 Sören 2012-12-29 21:46:53 UTC
After testing it in
Version 4.0.0.0.beta2 (Build ID: 4104d660979c57e1160b5135634f732918460a0)
on Ubuntu 12.04 (x86)
I think the problem has a different reason, namely that \" is not seen as an escaped quote, and hence interpreted as string delimiters. A CSV-file containing
"test";";t";EOF
gives 
test|;t|EOF
as expected, so the semicolon inside the quotes is indeed interpreted as part of the string, and not as an separator.

As I'm not aware of how CSV-import should handle escaped special characters, I'm unsure if this is a bug.
Comment 2 bfoman (inactive) 2013-03-22 10:32:05 UTC
Changed Summary to reflect real issue in the report.
Comment 3 gmolleda 2013-04-05 05:57:19 UTC
Confirmed error in LibreOffice Calc 4.0.1.2

Using menu Insert - Sheet From File...

If you create with text editor the file with text:
1<TAB>"text with <NEWLINE ENTER>
a newline character in text before close"<TAB>3

Save text, I have a newline into text field, the text field is with "

Insert this file in calc and you will see that the <new line> into "..." change the row and break the text field.

The error:
   A                                           B               C
1  1                                           text with
2  a newline character in text before close"   3

when the correct behaviour must be:
   A             B                                                 C
1  1             text with                                         3
                 a newline character in text before close
2
Comment 4 niknah 2013-04-24 00:30:16 UTC
In most other spreadsheets double quotes in .csv files are saved as ""

http://en.wikipedia.org/wiki/Comma-separated_values  (basic rules section)
http://tools.ietf.org/html/rfc4180 (section 2.7)

At the moment I can't see anyway to open up .csv files with values in double quotes.  The only way I can think of is to convert them into unicode double quotes.
Comment 5 Alex Thurgood 2015-01-03 17:40:47 UTC Comment hidden (no-value)
Comment 6 QA Administrators 2016-01-17 20:03:57 UTC Comment hidden (obsolete)
Comment 7 QA Administrators 2017-03-06 14:19:24 UTC Comment hidden (obsolete)
Comment 8 Buovjaga 2018-06-22 12:20:06 UTC
*** Bug 118007 has been marked as a duplicate of this bug. ***
Comment 9 QA Administrators 2019-06-23 02:51:46 UTC Comment hidden (obsolete)
Comment 10 Josep Lladonosa 2019-10-15 07:13:23 UTC
(In reply to Sander Bol from comment #0)
> Created attachment 67324 [details]
> An example file containing a reduced test case.
> 
> Problem description: 
> Loading a CSV file that contains the field delimiter (ie: semicolon) within
> a text field (ie: double quotes) should ignore the field delimiter. It is
> placed within the text-delimiters to indicate it is part of the text string,
> and should therefore not be treated as a field delimiter.
> 

I confirm that this bug is still present in LO 6.3.2.2.

Perhaps the whole module should be reviewed in order to take into consideration

https://www.rfc-editor.org/info/rfc4180

This would solve some of the comments here.
Comment 11 Eike Rathke 2021-08-29 21:27:34 UTC
This is not a bug.
The file content

"test";"\";\"";EOF

with field separator ';' and quote character (text delimiter) '"' is *correctly* imported as

test|\|\""|EOF

The first \" is *not* an escaped double quote, CSV does not escape quotes by backslash but by doubling them, it is a \ backslash enclosed in double quotes.

The third field with \"" is malformed because it contains a (meant to be escaped by doubling it?) double quote but is not quoted with double quotes. Calc tries to import it nevertheless and keeps all data encountered.

Whatever produced that content is a broken generator.
Comment 12 Danilo 2021-11-03 14:14:04 UTC
I'm having the same problem when I tried to parse .csv data for a script for my company. It took me a entire day to figure out what was happening, then I import the same workbook from a windows computer and everything worked as expected.
Comment 13 prasedenica 2022-05-18 04:49:32 UTC Comment hidden (spam)
Comment 14 Eike Rathke 2022-10-02 15:21:10 UTC
Again, in CSV \" is *not* an escapement for an embedded quote character.
Read https://www.rfc-editor.org/rfc/rfc4180