Created attachment 74403 [details] Test file for embedded tab and line break altogether Using the tab character as the separator (field delimiter), Calc will not recognize correctly a field that contains a line break after an embedded tab. Example: "value1.1"<tab>"value2.1<LB><tab>value2.2<LB>value2.3"<tab>"value3.1" The above will yield: value1.1 => correct value2.1<LB><tab>value2.2 => correct value2.3" => INCORRECT (see it also includes the closing quote) value3.1 => correct However, if I change the separator to comma, it all works out correctly. So, I guess this bug is pertinent to using "tab" as separator.
Created attachment 74404 [details] Test file with comma as separator
Created attachment 74458 [details] Screenshot result and options. Hi x.gylee, thanks for reporting. Attached screenshoot of what I get and the options import with 3.6.5. I think is right.
I don't think that is right. value2.3 should go ogether in cell B2 since the line break is embedded inside the text delimiter for value2.x. value3 should go into cell C2 (under head 3).
Hi x.gylee. I get the same results as you from those files. I made modifications, and deleted the line breaks inside the second set, changing the file to "value1.1"<tab>"value2.1<tab>value2.2<tab>value2.3"<tab>"value3.1" This gives the result you expect. So for some reason, it doesn't like line breaks inside the delimeter. I don't use calc a lot, so I would need to research if this is expected behavior. Unless someone else can confirm.
Hi Brenda, thanks for responding. The point of this bug is the embedded line breaks. It should NOT be treated as a separator if it is inside a field delimiter. If you try the 2nd attachemnt, you will see it's working fine if the separator is a comma. But, somehow it doesn't if the separator is a tab. There is RFC 4180 if it would help.
Thank you, x.gylee, for the document. According to RFC 4180, "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes." Calc should not treat line breaks as delimeters inside double quotes. I confirmed this behavior using LibreOffice 4.0 on Ubuntu 12.04.
I am going to change the status to NEW, and mark as NORMAL, HIGH. The reason for this priority is that there is no loss of data, or broken core functions, such as printing or saving the document. Calc also doesn't have a problem opening a document including embedded line breaks inside the tab. The data is there, though not with the expected appearance. This does prevent high quality work, and most users would expect the line break inside the delimeter to be ignored per the documentation in RFC 4180. I hope you agree. Thank you for reporting this!
Created attachment 79267 [details] LibreOffice Output CSV saved with TAB This shows, how libreoffice saves this case. It is confused by having data mixed. * Some data is between " * Some data is not between " If all data is enclosed within " it works (see attachment)
I guess the file sc/source/ui/docshell/impex.cxx has an error. Ehm, somebody has programmed this error on intention. Line 2330: if (nQuotes) { if (bTabSep && *p == '\t' && (nQuotes % 2) != 0) { // When tab-delimited, tab char ends quoted sequence // even if we haven't reached the end quote. Doing // this helps keep mal-formed rows from damaging // other, well-formed rows. nQuotes = 0; break; } Well, that heuristic will not work with a correct file. The author was Caolán McNamara <caolanm@redhat.com> of 17fe34ec569f3e14f35f3958cc5885a00bd6cff9. The correct way to have such a processor in place: parse file, if not correct(User decission!!!) enable workaround. This error needs more steps: * Change in the UI (maybe "strict mode"). * Change of the code there (if (bTabSep && *p == '\t' && (nQuotes % 2) != 0 && (!strictCSVImportMode)) That is my idea to solve this error.
Matthias, thanks for pointing out the culprit. Because of this error, my boss has been yelling at me to fix my CSV file since he thought it was not being properly constructed. If the developers could implement the fix, it would be a great help.
caolanm->Matthias: You misinterpreted In http://cgit.freedesktop.org/libreoffice/core/commit/?id=17fe34ec569f3e14f35f3958cc5885a00bd6cff9 a bit, that commit by me just *moves* that code from "tools" to "sc". The lines you highlight were added by Kohei as http://cgit.freedesktop.org/libreoffice/core/commit/?id=322cbc3818b0553254aab2dfb3c5b196fe814097 Whatever the final solution, this looks like an excellent low hanging place for a regression test.
Taking. I believe that workaround isn't necessary anymore with my rework of the CSV import last year and actually just breaks things.
Also without that workaround the import of the one failing document https://bugzilla.novell.com/attachment.cgi?id=294589 from https://bugzilla.novell.com/show_bug.cgi?id=507322 works fine, with one difference in row 1805 where data is broken and parts of two rows end up in one cell P1805. That row wasn't correct with the workaround either, see also cell P1805 in earlier versions.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=a883783c2a1b0e7a8ee5b46387d1e1e3bc5e19bd resolved fdo#60468 no special tab case workaround for CSV import The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Pending review for 4-0 as https://gerrit.libreoffice.org/3905 for 3-6 as https://gerrit.libreoffice.org/3906
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=b2b6753df528d62c7f0a2be902c822a029512523&h=libreoffice-3-6 resolved fdo#60468 no special tab case workaround for CSV import It will be available in LibreOffice 3.6.7. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-4-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=09ff02c499579c1dd9eed500121875e586ea8716&h=libreoffice-4-0 resolved fdo#60468 no special tab case workaround for CSV import It will be available in LibreOffice 4.0.4. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
As of 08.Jun.2013 I cannot reproduce this bug anymore. Both sample files yield the same results
Xisco Fauli committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/4ee58a0dedc5160551260793e5fed9e6c3842976 tdf#60468, tdf#142040: sc: Add UItest It will be available in 7.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.