Bug 132414 - Allow multi-character delimiters in CSV import
Summary: Allow multi-character delimiters in CSV import
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.4.3.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Dialog
  Show dependency treegraph
 
Reported: 2020-04-25 19:06 UTC by Mikhail Novosyolov
Modified: 2024-03-25 18:22 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example CSV (50.89 KB, text/csv)
2020-04-25 19:06 UTC, Mikhail Novosyolov
Details
Screenshot illustrating the bug (52.36 KB, image/png)
2020-04-25 19:07 UTC, Mikhail Novosyolov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Novosyolov 2020-04-25 19:06:18 UTC
Description:
Example CSV:

3305460;;import/gcc;;i686;;error: in `/builddir/build/BUILD/gcc-8.3.0/BUILD': ;;xxx

Here 2 ";" are a separator, but, when csv is opened in Calc, despite how many ";" I write to the filed "Separator", LibreOffice thinks that only one ";" is a separator and makes empty columns.

Steps to Reproduce:
.

Actual Results:
.

Expected Results:
.


Reproducible: Always


User Profile Reset: No



Additional Info:
.
Comment 1 Mikhail Novosyolov 2020-04-25 19:06:40 UTC
Created attachment 159931 [details]
Example CSV
Comment 2 Mikhail Novosyolov 2020-04-25 19:07:02 UTC
Created attachment 159932 [details]
Screenshot illustrating the bug
Comment 3 Ming Hua 2020-04-25 20:09:49 UTC
For me, choosing "semicolon" and "merge delimiters" got rid of the empty columns and imported the example CSV file as desired.  Can you try this as well?
Comment 4 Roman Kuznetsov 2020-04-25 22:41:34 UTC
Your report looks as RFE like "Add opportunity to set up any number of symbols as one separator"
Comment 5 Ming Hua 2020-05-02 21:38:42 UTC
Mikhail, please give a reply about
1. Does my suggesting in comment #3 solve you problem?
2. Do you want to propose an enhancement about using multiple characters as separator, like Roman said in comment #4?
Comment 6 Mikhail Novosyolov 2020-05-11 05:30:36 UTC
(In reply to Ming Hua from comment #5)
> Mikhail, please give a reply about
> 1. Does my suggesting in comment #3 solve you problem?
Yes, it does, thank you, but it is not obvious at all and I do not catch the logics behind this behaviour
> 2. Do you want to propose an enhancement about using multiple characters as
> separator, like Roman said in comment #4?
I did not understand what Roman meant. If I specify ';;' as A separator (a = one), why does LibreOffice ignore the second ';'?
Comment 7 Mikhail Novosyolov 2020-05-11 05:32:01 UTC
E.g.
cat *.csv | awk -F ';;' '{print $1}'
works as I want
I expected the same behaviour from LibreOffice
Comment 8 QA Administrators 2020-05-12 03:52:58 UTC Comment hidden (obsolete)
Comment 9 Buovjaga 2020-08-28 18:59:18 UTC
Indeed, discussed before: https://bugs.documentfoundation.org/show_bug.cgi?id=127718#c4

Stuart mentions that any Unicode glyphs can be used as separators. It does work with emojis, but you still need to merge the delimiters.
Comment 10 Heiko Tietze 2020-09-10 13:47:38 UTC
You can escape characters that shouldn't be used as delimiter. Like "Foo;";"Bar";";";"Baz". Plenty of options and I would rather stick to the single character separator to keep things simple and familiar. Consider users complaining why ; doesn't work after unintentionally adding a space after, ie. "; ". => WF
Comment 11 Eike Rathke 2020-09-10 19:40:00 UTC
Heiko, that does not address the original request that asks to be able to specify a sequence of characters like ';;' two semicolons to be treated as *one* separator, as the original data uses such field delimiter. It's nothing about quoting or escaping extraneous semicolons. The data probably is delimited such to cater for the case where one semicolon could be part of a field content and the generator software is too lazy to quote and escape field content.

However, the Merge Delimiters option exactly solves this very problem except the case of a semicolon embedded in field content. If that is really needed then pre-process the data before importing to make it comply with the syntax of RFC 4180 (using any delimiter, not restricted to comma).

I'd rather not reimplement everything to allow a delimiter string instead of a delimiter character..


(In reply to Mikhail Novosyolov from comment #6)
> If I specify ';;' as A separator (a =
> one), why does LibreOffice ignore the second ';'?
In the Other input field one can specify a list of single character delimiters, not a string that is used as one delimiter.
Comment 12 Justin L 2020-12-15 11:41:45 UTC
(In reply to Eike Rathke from comment #11)
> pre-process the data before importing to make it comply with the syntax
> of RFC 4180 (using any delimiter, not restricted to comma).

Yes - exactly. LibreOffice does not need to cater to every conceivable textual data model. Anyone trying to manipulate text data should be able to search/replace with something like a | or whatever will work for their data set.

> I'd rather not reimplement everything to allow a delimiter string instead of
> a delimiter character..

WONTFIX