Bug 32703

Summary: CSV import could ignore leading spaces if the field content without them is quoted.
Product: LibreOffice Reporter: Ken Ward <ken.ward>
Component: CalcAssignee: Eike Rathke <erack>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: medium    
Version: 3.3.0 RC1   
Hardware: All   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 38637, 39868    
Attachments: The import screen, showing the quoted string field being separated into three fields by embedded commas.

Description Ken Ward 2010-12-28 09:18:27 UTC
Created attachment 41489 [details]
The import screen, showing the quoted string field being separated into three fields by embedded commas.

Found in:

LibreOffice 3.3.0 
OOO330m9 (Build:1)
libreoffice-build 3.2.99.2

CSV import into calc:

Options: 
See attachment libre_csv_bug1.jpg

Header line was:
"NAME", "ID", "VARIANT_NAME", "VARIANT_ID", "PARENT_ID", "INHERITS_FROM_ID", "TYPE", "DESCRIPTION", "MOD_DATE", "MOD_TIME", "REVISION", ...

Input line was:
"parm1", "82", "SFT2", "2", "58", "NA", "SPEC", "overridden in Part 4, LevelB2, SFT2", "20101228", "09:58:23", "3", ...
Comment 1 Kohei Yoshida 2011-01-21 07:04:37 UTC
Ken, can you still reproduce this in RC4?
Comment 2 Ken Ward 2011-01-21 07:17:42 UTC
I will download it and try.

Thanks!

-Ken

> https://bugs.freedesktop.org/show_bug.cgi?id=32703
>
> Kohei Yoshida<kyoshida@novell.com>  changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>    Status Whiteboard|                            |inforprovider:reporter
>             Keywords|                            |NEEDINFO
>            Component|Libreoffice                 |Spreadsheet
>
> --- Comment #1 from Kohei Yoshida<kyoshida@novell.com>  2011-01-21 07:04:37 PST ---
> Ken, can you still reproduce this in RC4?
>
Comment 3 Ken Ward 2011-01-21 07:44:31 UTC
Yes, the problem still exists in RC4.

Best regards,

-Ken Ward


> https://bugs.freedesktop.org/show_bug.cgi?id=32703
>
> Kohei Yoshida<kyoshida@novell.com>  changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>    Status Whiteboard|                            |inforprovider:reporter
>             Keywords|                            |NEEDINFO
>            Component|Libreoffice                 |Spreadsheet
>
> --- Comment #1 from Kohei Yoshida<kyoshida@novell.com>  2011-01-21 07:04:37 PST ---
> Ken, can you still reproduce this in RC4?
>
Comment 4 Bennet Huber 2011-05-17 19:40:58 UTC
I've noticed this problem too, that the CSV reader doesn't seem to respect quotes very well.  It doesn't handle newlines in quoted strings either (it just incorrectly interprets them as new rows).  This is a slightly trickier problem, because I'm pretty sure newlines aren't a legal character within a cell value, so they should probably be either skipped over or replaced with some user-definable string.

Also, there is an RFC for the csv format that might be helpful:
http://tools.ietf.org/html/rfc4180
Comment 5 Eike Rathke 2011-08-16 16:53:47 UTC
First of all, the generator putting leading blanks in front of quoted field content is violating the CSV specification previously mentioned. However, there seem to exist some generators of that kind and being lax on this when importing CSV may be desired.

Implemented in master http://cgit.freedesktop.org/libreoffice/core/commit/?id=acd31343d1a346f045a8145894c7e4451910cbf8

@ Bennet Huber: importing field content with newlines within a quoted field does work, it only didn't when leading blanks were present (or when embedded quotes aren't properly escaped, but that is a different story).
Comment 6 Mircea 2012-04-13 16:12:54 UTC
The bug is still present in LibreOffice version 3.5.2.2. I think it was incorrectly marked as "fixed" -- from the description of the commit, the fix was for leading spaces in csv files, not for the comma taking precedence over quotation marks.

Example of CSV file that leads to this bug:

="main_effect",="0.120",="0.090",="0.130",="0.112"
="",="(0.093)",="(0.095)",="(0.143)",="(0.138)"
="",="[-0.062,0.302]",="[-0.096,0.276]",="[-0.151,0.410]",="[-0.158,0.382]"

The third row should have exactly the same number of cells as the other rows, yet each cell containing an interval is split into two.
Comment 7 Eike Rathke 2012-04-14 15:18:38 UTC
1. Your example is different from what this bug originally was about, it
   does not contain spaces between the comma separator and a following
   quote.
2. Your example is not valid CSV data, see
   http://tools.ietf.org/html/rfc4180