Bug 169514 - Data Provider fails with CSV string and CR+LF
Summary: Data Provider fails with CSV string and CR+LF
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
26.2.0.0 alpha0+ master
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Data-Provider 169515 169516 169517 169518 169547
  Show dependency treegraph
 
Reported: 2025-11-18 10:46 UTC by Michael Otto
Modified: 2025-11-19 18:31 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
empty Calc spreadsheet with Data Range (7.08 KB, application/vnd.oasis.opendocument.spreadsheet)
2025-11-18 10:46 UTC, Michael Otto
Details
CSV file with string and CR+LF (9 bytes, text/csv)
2025-11-18 11:10 UTC, Michael Otto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Otto 2025-11-18 10:46:15 UTC
Created attachment 204066 [details]
empty Calc spreadsheet with Data Range

PRECONDITION:
new Calc spreadsheet, Data > Define Range 
(see attachment DataRangeForDataProvider.ods)


PROBLEM DESCRIPTION:
Data > Data Provider
select Range, CSV, URL=attached AB-CrLf.csv ("A","B"\r\n)

--> "An error occurred while parsing the CSV file."

The problem occurs only if 
- the file ends with \r\n (works fine: "A","B"\n)
and
- the last CSV entry is a string like "B" (B without "" works fine: "A",B\r\n)


EXPECTED BEHAVIOR:
Files with CR+NL (DOS/Windows format) shall be usable in Linux as well


Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 480ef73deef62c458e5735cd496a1d74ef408ed8
CPU threads: 2; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: de-DE (de_DE.UTF-8); UI: en-US
Calc: threaded
Comment 1 Michael Otto 2025-11-18 11:10:06 UTC
Created attachment 204067 [details]
CSV file with string and CR+LF
Comment 2 Regina Henschel 2025-11-18 21:07:17 UTC
The same error exists on Windows. Tested with Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 620(Build:0)
CPU threads: 32; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Vulkan; VCL: win
Locale: de-DE (de_DE); UI: en-US
Calc: threaded

Import works, if the EOL is only LF.
Comment 3 Regina Henschel 2025-11-18 21:59:27 UTC
The problem is in orcus: 
...\UnpackedTarball\liborcus\include\orcus\csv_parser.hpp, lines 126-136.

The parser gets a '\r'. And that is neither a '\n' nor a delimiter. The delimiter is contained in maConfig.delimiters. In this case it is a comma. Method is_delim('\r') returns false and thus it throws a orcus::parse_error.
Comment 4 Michael Otto 2025-11-19 10:12:04 UTC
(In reply to Regina Henschel from comment #3)
> ... The parser gets a '\r'. And that is neither a '\n' nor a delimiter. ...

after the string ends (with the trailing "), there is expected either a delimiter or a newline ("A","B"x\n also leads to a parsing error - OK)

obviously the \r is handled as an ordinary (printable) character:
"A",B\r\n does not fail, because the \r gets part of the 2nd field:
<text:p>B&#x0d;</text:p> in content.xml
"A","B",\r\n leads to a single \r as a 3rd field: <text:p>&#x0d;</text:p> in content.xml

shall any \r be discarded?

or shall a \r at any position (e.g. A\r, B\r) get part of the field? 
(then the DOS file A,B\r\n would lead to fields A and B\r, the resulting field B\r is not equal to B, and the user can't see a difference on the GUI!)