CSV files not strictly following the CSV specification, regarding embedded quotes inside a quoted field are to be doubled, easily trick the import into not distributing following content as the generator intended it to be. Implement some magic to detect and correct at least some of those cases to prevent data loss.
Another test case mentioned there, originally from
,"abc" d "ef",
currently results in
'abc d "ef"'
To not lose data it should result in
'abc" d "ef'
Doing so would also lead to
resulting in _one_ field
and not two, 'ab' and ' "a"' like it is currently the case. This would then differ from how Excel treats it, but would be more consistent.
Created attachment 59980 [details]
conglomerate of testcases attached to the mentioned OOo issues
Created attachment 59981 [details]
the testcase file exported again, fixing representation
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":
resolved fdo#48621 better handling of broken CSV files
I was the author of some of the attachments to OOo bug shown here: https://issues.apache.org/ooo/show_bug.cgi?id=78926
I was able to confirm that the 3 test cases I presented were fixed, but the original bug report included a sample input that I still could not open. I'm not sure if the error message shown below is expected for that input.
Thanks for addressing the cases I found, though!
Confirmed fixed in LibreOffice 18.104.22.168 on Mac OS X:
- att #75189 @ OOo BZ
- att #75191 @ OOo BZ
- att #75192 @ OOo BZ
I still get an error loading the original "input" (att #46282 @ OOo BZ): "The data could not be loaded completely because the maximum number of characters per cell was exceeded".
(In reply to comment #4)
> I still get an error loading the original "input" (att #46282 @ OOo BZ):
> "The data could not be loaded completely because the maximum number of
> characters per cell was exceeded".
I don't get that error, tested in 3.6.4 and 4.0.0.rc2+ and master. All versions load 2371 rows without complaining, which matches the number of lines in the input file.