Bug 113536 - CSV import dialogue confusing in respect to text delimiters
Summary: CSV import dialogue confusing in respect to text delimiters
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Heiko Tietze
URL:
Whiteboard: target:6.0.0
Keywords:
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2017-10-30 20:56 UTC by Steve Edmonds
Modified: 2017-11-27 14:47 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
CSV file that imports incorrectly (5.00 KB, text/csv)
2017-10-30 20:58 UTC, Steve Edmonds
Details
Simpler test file (206 bytes, text/csv)
2017-10-30 21:44 UTC, Steve Edmonds
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Edmonds 2017-10-30 20:56:52 UTC
Description:
Importing a CSV in Calc with quoted fields adapts and shows correctly in the pre-import dialogue when "Quoted field as text" is checked but when the file opens in Calc the setting has been ignored.

Steps to Reproduce:
1.In calc open sample file
2.Set delimiter as comma, Check "Quoted field as text"
3.Note in dialogue columns adapt and there are only 3 columns
4. Click OK for Calc to load file.

Actual Results:  
Text is spread across 5 columns, the same as if "Quoted field as text" is not checked.

Expected Results:
Quoted text to be contained in the Notes column as in the preview in the import dialogue


Reproducible: Always


User Profile Reset: No



Additional Info:
If at step 3 the dialogue does not adapt with checking "Quoted field as text", open the file anyway. Close the file and open again where the dialogue should already have "Quoted field as text" checked and the columns should show correctly.


User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Comment 1 Steve Edmonds 2017-10-30 20:58:15 UTC
Created attachment 137384 [details]
CSV file that imports incorrectly
Comment 2 Steve Edmonds 2017-10-30 21:13:00 UTC
The crux of this bug is that the import dialogue column preview is correct and the final import is not.

The possible cause is the handling on the actual import of quotes within the quoted strings.

Example string
"The intended title for your Quality Manual (ex: ""Quality Manual,"" ""Management System Manual,"" etc.)"
Comment 3 Steve Edmonds 2017-10-30 21:44:55 UTC
Created attachment 137387 [details]
Simpler test file
Comment 4 Steve Edmonds 2017-10-31 02:54:06 UTC
On further investigation it seems on actual import the columns are splitting on delimiters within quoted text, regardless of the setting of "Quoted field as text".
Comment 5 Steve Edmonds 2017-10-31 03:38:44 UTC
This seems to work OK on 5.4.2.2 Mac
Comment 6 Steve Edmonds 2017-11-01 07:40:59 UTC
Now I try the same file on another machine, same OS, LO 5.3.5.2, and it works fine. Windows 1252/Latin 1, English
Comment 7 Steve Edmonds 2017-11-01 07:43:52 UTC
Now I think I have an import working.
If I put " in text delimiter it imports OK in LO 5.3.5.2 opensuse
Comment 8 Buovjaga 2017-11-10 14:12:11 UTC
Ok, let's close.
Comment 9 Steve Edmonds 2017-11-10 22:21:33 UTC
I think an enhancement would resolve the confusion that led me to think this was a bug.

The import dialogue under "Separator Options" states "Text delimiter".

I don't think this is accurate, or at lease clear. 
The text on an imported row is delimited by the Tab, Comma, Semicolon, Space, Other into columns in the sheet.

The Text delimiter character is in fact an encapsulation character. What is encapsulated between these characters (Text delimiter) is a block of text that is not to be broken up if it contains a delimiter. It is still imported as format "Number-General".

I would suggest an enhancement of re-naming this "Text encapsulation" or "String definition" or similar clarifying term.

This would also remove confusion with "Merge delimiters", which doesn't merge "Text delimiters" but I assume merges Separators.
Comment 10 Buovjaga 2017-11-11 14:29:48 UTC
Ok, let's bring this to the council of UX elders
Comment 11 V Stuart Foote 2017-11-11 17:54:01 UTC
Yes, it is a bit off. 

Though handled in source as separators as are the field/column selections, the "Text delimiter:" label should probably be changed to "String delimiter:". And in the Other option section the "Quoted field as text" checkbox should probably be relabeled as "Import delimited Strings as text".

Changing these translated .ui strings is trivial, but don't think it is worth it to rework the source to change the existing "texttextdelimiter" and "quotedfieldsastext" values.

=-ref-=
https://opengrok.libreoffice.org/xref/core/sc/uiconfig/scalc/ui/textimportcsv.ui#428

https://opengrok.libreoffice.org/xref/core/sc/source/ui/dbgui/scuiasciiopt.cxx
Comment 12 Steve Edmonds 2017-11-12 21:32:29 UTC
I think changing the labels (.ui strings?) would be a great help.
Possibly try some consistency. I also suspect the wording used is interpreted differently in different regions.

The save as CSV dialogue uses "Field Delimiter" and "Text Delimiter"
The check box further down states "Quote all text cells"

If I looked at this language I would suggest changing "Text Delimiter" to "Text Quote" or "Quote character" so it is clear it is the character for ""Quote all text cells".

Now looking at the import dialogue 
We talk about separators. Should the export be re-visited and "Field Delimiter" renamed "Field Separator", and on the import "Merge delimiters" > "Merge separators". 

Looking now at "Text Delimiter" on the import.
If I have a string in the file [cat "feline" "1234"] as a field between separators, what does the user think the setting "Text Delimiter" does when set to '"'. 
A suggestion here is "Text field quotes" or "Text field quotemarks", but where I am everyone knows quotes implies " or ' at the beginning and end of a string and this might not be general in world terms.

If I have it right, "Quoted field as text" is more accurately "Format quoted field as text" as opposed to Number/General
Comment 13 Heiko Tietze 2017-11-15 11:42:45 UTC
So this question is more or less up to the l10n team whether or not they are willing to translate those minor changes. IMHO it's only a small advantage. 

To summarize the previous comments:

* Import CSV dialog

1 "Text delimiter:" 
A "String delimiter:" 
B "Text field quotes:" 
C "Text field quotemarks:"

2 "Quoted field as text" 
A "Import delimited Strings as text" 
B "Format quoted field as text"

* Export CSV dialog

3 "Field Delimiter:"

4 "Text Delimiter:"
A "Text Quote"
B "Quote character"
C "Text field quotes" (for consistency with 1B)

5 "Quote all text cells"

If we go ahead with a new wording, my take is 1B, 2B, 4C.
Comment 14 V Stuart Foote 2017-11-15 17:59:05 UTC
(In reply to Heiko Tietze from comment #13)
> ...
> If we go ahead with a new wording, my take is 1B, 2B, 4C.

Giving us

"Text delimiter" -> "Text field quotes:"
"Quoted field as text" -> "Format quoted field as text"

"Text delimiter:" -> "Text field quotes:"

Personally still prefer the string(s) delimiter syntax, but no great objection to this .ui change assuming l10n can translate them. So +1

an Easyhack?
Comment 15 Steve Edmonds 2017-11-15 20:05:30 UTC
As I mentioned different locales may have different terminologies so l10n may have good input.
I know it took me quite some time to figure out how the selections/entries in the dialogue worked to the point that I went to a text editor for the CSV as I couldn't get import working and started this thinking it was a bug in the import.

If the import dialogue had that hover text under the cursor like the tool bars then that could be used for clarification without changing labels.
Comment 16 Steve Edmonds 2017-11-15 21:25:45 UTC
My thoughts are not to use terms that can be interpreted in different ways, the actual term is not so important. The import specifies separation and encapsulation of strings forming fields, I have never thought of these being the same thing, i.e. delimiting strings. It would be interesting to know how this is handled in other languages (l10 dept.)

A quick google on "delimiter", just a couple below. The first has always been my understanding but obviously plenty of people agree with the second.

A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams

In computer programming, a delimiter is a character that identifies the beginning or the end of a character string
Comment 17 sophie 2017-11-16 10:21:33 UTC
(In reply to V Stuart Foote from comment #14)
> (In reply to Heiko Tietze from comment #13)
> > ...
> > If we go ahead with a new wording, my take is 1B, 2B, 4C.
> 
> Giving us
> 
> "Text delimiter" -> "Text field quotes:"
> "Quoted field as text" -> "Format quoted field as text"
> 
> "Text delimiter:" -> "Text field quotes:"
> 
> Personally still prefer the string(s) delimiter syntax, but no great
> objection to this .ui change assuming l10n can translate them. So +1
> 
> an Easyhack?

I also prefer string(s) delimiter syntax. Note that 1C could become a very long string in other languages. But I have no objection either to Heiko proposals. Sophie
Comment 18 Steve Edmonds 2017-11-16 20:02:06 UTC
(In reply to sophie from comment #17)

> 
> I also prefer string(s) delimiter syntax. Note that 1C could become a very
> long string in other languages. But I have no objection either to Heiko
> proposals. Sophie

Does "Text encapsulation" sound too old school, not user friendly or not easily understood. For me it is the least unambiguous, but the widest understanding is desirable, especially if the interface is not in the users first language.
Comment 19 Steve Edmonds 2017-11-17 19:50:09 UTC
Two more thoughts for 1. "Text delimiter:", possibly more universally understood English than encapsulation.

"Text delimiter:" -> "Text enclosed by:"
"Text delimiter:" -> "Text contained in:"
Comment 20 Heiko Tietze 2017-11-17 22:26:04 UTC
19 comments with different ideas, that's what devs love to read ;-)

Will change it to 

"Text delimiter" -> "String delimiter:"
"Quoted field as text" -> "Format quoted field as text"

"Text delimiter:" -> "String delimiter:"

...to refrain l10n team from unnecessary work.
Comment 21 Commit Notification 2017-11-19 10:31:59 UTC
heiko tietze committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d68bdb562ac48e340d915610de618547895eece2

tdf#113536 Better wording for CSV separator

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Eike Rathke 2017-11-20 09:54:49 UTC
(In reply to Heiko Tietze from comment #20)
> 19 comments with different ideas, that's what devs love to read ;-)
Want another one?

> "Text delimiter" -> "String delimiter:"
> "Quoted field as text" -> "Format quoted field as text"
> 
> "Text delimiter:" -> "String delimiter:"

String delimiter is the same confusing thing as Text delimiter and wrong. It is not a delimiter at all. It is the quote character that encloses field content, it is not restricted to textual field content.
Comment 23 V Stuart Foote 2017-11-20 15:35:56 UTC
(In reply to Eike Rathke from comment #22)

> String delimiter is the same confusing thing as Text delimiter and wrong. It
> is not a delimiter at all. It is the quote character that encloses field
> content, it is not restricted to textual field content.

Yes, of course it is not restricted to just textual field content (fields defined by the CSV field separator selected).

But the "quote" character selected denotes enclosure--i.e. it brackets, surrounds, defines and even delimits--a string (of numbers, punctuation, text) that are then to be handled as immutable alphanumeric textual field and not otherwise converted by import actions into the sheet.

This exercise in English semantics will never be resolved, IMHO Heiko's patch is acceptable and ready for l10n.
Comment 24 Eike Rathke 2017-11-21 20:51:31 UTC
(In reply to V Stuart Foote from comment #23)
> But the "quote" character selected denotes enclosure--i.e. it brackets,
> surrounds, defines and even delimits--a string (of numbers, punctuation,
> text) that are then to be handled as immutable alphanumeric textual field
> and not otherwise converted by import actions into the sheet.
And that exactly does not hold. An enquoted field does not mean the content would be an immutable text (for this the behaviour the "Quoted field as text" option exists) not to be converted, it can also mean the field is enquoted because it contains the field separator in an otherwise numeric value e.g. as decimal separator, or the application that wrote the file simply enclosed all fields in quotes, regardless whether necessary or not.

Anyway, this is moot now, but some user confusion ("but but this is enclosed in quotes so it must be text") stems from this miswording talking of "text delimiter". IMHO "Quote character" would be suitable.