Created attachment 204066 [details] empty Calc spreadsheet with Data Range PRECONDITION: new Calc spreadsheet, Data > Define Range (see attachment DataRangeForDataProvider.ods) PROBLEM DESCRIPTION: Data > Data Provider select Range, CSV, URL=attached AB-CrLf.csv ("A","B"\r\n) --> "An error occurred while parsing the CSV file." The problem occurs only if - the file ends with \r\n (works fine: "A","B"\n) and - the last CSV entry is a string like "B" (B without "" works fine: "A",B\r\n) EXPECTED BEHAVIOR: Files with CR+NL (DOS/Windows format) shall be usable in Linux as well Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 480ef73deef62c458e5735cd496a1d74ef408ed8 CPU threads: 2; OS: Linux 6.8; UI render: default; VCL: gtk3 Locale: de-DE (de_DE.UTF-8); UI: en-US Calc: threaded
Created attachment 204067 [details] CSV file with string and CR+LF
The same error exists on Windows. Tested with Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 620(Build:0) CPU threads: 32; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Vulkan; VCL: win Locale: de-DE (de_DE); UI: en-US Calc: threaded Import works, if the EOL is only LF.
The problem is in orcus: ...\UnpackedTarball\liborcus\include\orcus\csv_parser.hpp, lines 126-136. The parser gets a '\r'. And that is neither a '\n' nor a delimiter. The delimiter is contained in maConfig.delimiters. In this case it is a comma. Method is_delim('\r') returns false and thus it throws a orcus::parse_error.
(In reply to Regina Henschel from comment #3) > ... The parser gets a '\r'. And that is neither a '\n' nor a delimiter. ... after the string ends (with the trailing "), there is expected either a delimiter or a newline ("A","B"x\n also leads to a parsing error - OK) obviously the \r is handled as an ordinary (printable) character: "A",B\r\n does not fail, because the \r gets part of the 2nd field: <text:p>B
</text:p> in content.xml "A","B",\r\n leads to a single \r as a 3rd field: <text:p>
</text:p> in content.xml shall any \r be discarded? or shall a \r at any position (e.g. A\r, B\r) get part of the field? (then the DOS file A,B\r\n would lead to fields A and B\r, the resulting field B\r is not equal to B, and the user can't see a difference on the GUI!)