Description: When importing a tsv or csv without string delimiters, if the final column consists of only whitespace it adds corrupt data to that column. If the final column is empty the row loads correctly. Also if the string delimiter is set to anything (even if the delimiter character does not appear in the document) the file loads correctly. One interesting behavior is initially the csv import dialog doesn't show corruption in the preview, however if you change any options the corruption appears. Tested reproducible on versions 5.4, 6.4, 7.0 Steps to Reproduce: 1. Have CSV/TSV file without string delimiters and with trailing column consisting of only whitespace 2. Turn off string delimiters in import dialog box 3. Click OK Actual Results: Right hand column contains corrupt data Expected Results: Right hand column blank Reproducible: Always User Profile Reset: Yes Additional Info: Version: 7.0.0.3 (x64) Build ID: 8061b3e9204bef6b321a21033174034a5e2ea88e CPU threads: 24; OS: Windows 10.0 Build 19041; UI render: Skia/Vulkan; VCL: win Locale: en-GB (en_GB); UI: en-GB Calc: CL
Created attachment 164559 [details] CSV file that triggers issue
Created attachment 164560 [details] Screenshot of initial import dialog display
Created attachment 164561 [details] Screenshot of import dialog after changing settings
Created attachment 164562 [details] Screenshot after file opens in calc
Confirmed. The key is to erase the double-quote in the string-delimiter box. Seems to have worked in LO 3.6. Bibisected with bibisect-linux-43all to get the range https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=a1ac2538e9b287444500618ab4d2f0f06c25cf34..19f4ebd8a54da0ae03b9cc8481613e5cd20ee1e7 Nothing clearly obvious in this range, but various suspicious commits involving ICU and libexttextcat. Bad _bibisect 43all commit_ a67b874d60de1f1a44bef57a53a7b8a84db0ba58.
I think its worth adding this comment here rather than opening a new bug... If you choose tab delimited, and string quote character double quote ( " ), then the following makes it choke f1\tf2\t"f3",xxx\tf4 What happens, everything after f3... even to the very end of the file (no matter how many lines and fields that includes) will get dumped into one cell. Now one might argue that the above is badly formatted (should quotes end right at field end?), but this is not the right way to handle it. Another thing, it wasn't obvious to me in the gui that the string delimited dropdown list was editable. I think a dropdown list here is pointless and distracting. Everyone uses either double quote or nothing. I would argue that as soon as you select tab delimited, this field should default to blank, because as far as I can tell, the whole internet is agreed that TSV files don't have a string quote character.
(In reply to xpusostomos from comment #6) > Another thing, it wasn't obvious to me in the gui that the string delimited > dropdown list was editable. I think a dropdown list here is pointless and > distracting. Everyone uses either double quote or nothing. You certainly know everyone and every usage and can be sure no one, absolutely no one, uses anything else. > I would argue > that as soon as you select tab delimited, this field should default to > blank, because as far as I can tell, the whole internet is agreed that TSV > files don't have a string quote character. Oh yes? Is it? Could you point out such agreement? So you'd argue that embedded tabs and embedded line feeds are not possible at all in a TSV file?
Reproduced with 7.1.4 Appears to be fixed since 7.1.5, most likely with bug 142395. *** This bug has been marked as a duplicate of bug 142395 ***
(In reply to Eike Rathke from comment #7) I enjoyed comment 6 very much, made me recall playing with MySQL's "SELECT INTO OUTFILE" [1], where it puts even null bytes (and any other bytes that may appear in BLOBS), with configurable FIELDS ENCLOSED BY, LINES TERMINATED BY, and even absolutely inconsistent FIELDS ESCAPED BY, that needed a home-grown parser [2], because they obviously didn't know what xpusostomos knew ;) [1] https://dev.mysql.com/doc/refman/8.0/en/select-into.html [2] https://mikekaganski.wordpress.com/2021/02/18/reading-from-mysql-data-with-blobs-dumped-to-csv/