Bug 144842 - LibreOffice Calc FILEOPEN Import csv 'Format quoted field as text' does not work
Summary: LibreOffice Calc FILEOPEN Import csv 'Format quoted field as text' does not work
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.1.6.2 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-30 15:31 UTC by Mike
Modified: 2021-09-30 18:35 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike 2021-09-30 15:31:22 UTC
Description:
Importing a space-separated CSV which includes quoted text in fields results in extra columns being created rather than merging them. e.g.:

group="SSLVPN Users" dst_host="N/A" reason="Lost the connection" duration=15 sentbyte=0 rcvdbyte=0 msg="SSL tunnel shutdown"



Steps to Reproduce:
1. Attempt to import space-separated CSV with quoted text in fields
2. Under Separator Options select Separated by: Space and Merge delimiters and enter a double-quote (") in the String Delimiter field.
3. Under Other Options, select Format quoted field as text and click Ok

Actual Results:
Fields are separated into multiple columns:

logdesc="SSL	VPN	tunnel	up"	action="tunnel-up"
logdesc="SSL	VPN	tunnel	up"	action="tunnel-up"
logdesc="SSL	VPN	tunnel	down"	action="tunnel-down"


Expected Results:
Fields should be merged:

logdesc="SSL VPN tunnel up"	action="tunnel-up"
logdesc="SSL VPN tunnel up"	action="tunnel-up"
logdesc="SSL VPN tunnel down"	action="tunnel-down"



Reproducible: Always


User Profile Reset: No



Additional Info:
Possibly related to bug #53325
Comment 1 Mike Kaganski 2021-09-30 16:18:25 UTC
Quoted field in CSV *starts* with a quote mark (normally "), and ends with this mark. What you show is something that includes such a character in the middle, and is *not* a quoted field.

Proper syntax of a quoted field:

123	"123	123"	123

If this sample is parsed with *tab* used as field separator, then it has three fields, first 123, second "123	123" (starting with the quote, and going until another quote followed by field separator), and third 123.

Another proper syntax:

"logdesc=""SSL VPN tunnel up"""	"action=""tunnel-up"""

Here the first field would be "logdesc=""SSL VPN tunnel up""", where the middle quotes are so-called "2DQUOTE" in the RFC, that would come as single literal quotes in the result.

Incorrect syntax:

logdesc="SSL VPN tunnel up"	action="tunnel-up"

(which would still be read OK if you use tabs as separators, but would *rightfully* fail your expectations if you use any space as a separator).

Ref.: RFC 4180.

Closing NOTABUG.

[1] https://datatracker.ietf.org/doc/html/rfc4180
Comment 2 Eike Rathke 2021-09-30 16:20:58 UTC
That data doesn't look space-separated but fixed-width.
But even if it was space-separated it would not fulfil the criteria for quoted fields because for that the *entire* field content has to be "enquoted like this", which
logdesc="SSL VPN tunnel up"
or any other of that data is not.

Import as fixed-width, placing a column separator between column 40/41 and if after import you want to get rid of the extra spaces in column A then use a formula in C1 =TRIM(A1) and fill that down until end of data. You may then copy those results into clipboard and Paste-Special it onto the data in column A, only Strings but not Formulas.

Not a bug.
Comment 3 Mike 2021-09-30 17:04:28 UTC
Thanks for the explanations. I understand the problem is that part of the field is quoted. I think I was incorrect in calling out the "Format quoted field as text" as the root of this.

Wouldn't the Merge delimiters option work though?:
"Combines consecutive delimiters and removes blank data fields"

BTW the tabs above came from me copying/pasting from Calc. And it is not fixed-width either. Scrubbed raw data follows:

date=2021-09-30 time=08:02:31 logid="..." type="event" subtype="vpn" level="information" vd="root" eventtime=... logdesc="SSL VPN tunnel up" action="tunnel-up" tunneltype="ssl-tunnel" ... dst_host="N/A" reason="tunnel established" msg="SSL tunnel established"
date=2021-09-30 time=08:02:31 logid="..." type="event" subtype="vpn" level="information" vd="root" eventtime=... logdesc="SSL VPN tunnel up" action="tunnel-up" tunneltype="ssl-tunnel" ... dst_host="N/A" reason="login successfully" msg="SSL tunnel established"
date=2021-09-30 time=08:02:31 logid="..." type="event" subtype="vpn" level="information" vd="root" eventtime=... logdesc="SSL VPN tunnel up" action="tunnel-up" tunneltype="ssl-tunnel" ... dst_host="N/A" reason="User requested termination of service" duration=100 sentbyte=0 rcvdbyte=0 msg="SSL tunnel shutdown"
date=2021-09-30 time=08:02:31 logid="..." type="event" subtype="vpn" level="information" vd="root" eventtime=... logdesc="SSL VPN tunnel up" action="tunnel-up" tunneltype="ssl-tunnel" ... dst_host="N/A" reason="User requested termination of service" duration=100 sentbyte=13183 rcvdbyte=75383 msg="SSL tunnel shutdown"
Comment 4 Mike Kaganski 2021-09-30 18:35:24 UTC
(In reply to Mike from comment #3)

This data can't be parsed as CSV, because it is not a valid csv in any case.

But you could try to turn it into one, pre-processing the data to remove everything matching the regex \b\w+= - then the rest would be the data without the field names (*if* the regex doesn't match anything inside the data itself).

And then you could just add a heading row with the removed column names manually.