Bug 127718 - Calc cannot split columns with multi-character delimiters
Summary: Calc cannot split columns with multi-character delimiters
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-23 12:02 UTC by david.cortes.rivera
Modified: 2024-03-25 18:22 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
split_multi_char_delim (53.68 KB, image/png)
2019-09-23 17:39 UTC, Oliver Brinzing
Details

Note You need to log in before you can comment on or make changes to this bug.
Description david.cortes.rivera 2019-09-23 12:02:36 UTC
Steps to reproduce:
- Create some delimited file with multi-character delimiters, e.g.
col1||col2||col3
a||b||c
- Open said file with LO Calc.
- In the text import dialog, select "Separated by", check "Other" and write "||" in there as separator.
- Click OK and check the file inside Calc.

Expected behavior: should separate columns every time there's two bars (||).

Actual behavior: separates columns every *single* bar, thus creating empty columns (can be fixed by checking "Merge delimiters", even though it's only one delimiter).
Comment 1 david.cortes.rivera 2019-09-23 15:39:01 UTC
By the way, the bigger issues are that:
- It's not possible to split by non-repeated characters correctly, e.g.
a|,|b|,|c
text1|,|text2,text_after_comma|,|text3|text_after_bar
- Even if it's the same character repeated multiple times, it will still split by single character, e.g.:
a||b||c
text1||text2|text_after_bar||text3
Comment 2 Oliver Brinzing 2019-09-23 17:36:49 UTC
reproducible with:

Version: 6.4.0.0.alpha0+ (x64)
Build ID: 71ef762f21ada8c25aad2183065478171e985e8c
CPU threads: 4; OS: Windows 10.0; UI render: default; VCL: win; 
Locale: de-DE (de_DE); UI-Language: en-US
Calc: threaded
Comment 3 Oliver Brinzing 2019-09-23 17:39:48 UTC
Created attachment 154392 [details]
split_multi_char_delim

it's only possible to hide the unwanted columns
Comment 4 V Stuart Foote 2019-09-23 19:04:54 UTC
No, using 'Other' with a multiple character separators is then collapsed using the 'Merge delimiters' checkbox.

Caveat is that if you use or work with strings that actually contain the separator character alone it would parse each out as additional columns.  But any utf-8 Unicode codepoint can be used, not limited to ASCII so make a good choice.

And unfortunately there is no awk/nawk style "string" separator; the Merge delimiter does not act on the string--just each character of the string. 

But that would be an RFE
Comment 5 V Stuart Foote 2019-09-23 19:11:08 UTC
Oh, and if using a Unicode glyph for the delimiter, remember to adjust the 'Character set:' encoding to Unicode (UTF-8) in the Text Import dialog.