Bug 153055 - [FILESAVE] Wrong parsening of CSV file
Summary: [FILESAVE] Wrong parsening of CSV file
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.4.4.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-17 05:32 UTC by eduardomozart182
Modified: 2023-01-17 06:56 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description eduardomozart182 2023-01-17 05:32:24 UTC
Description:
Hello LibreOffice team!
I'd imported a CSV file to LibreOffice with the following line:

"MSSQL 2017 - Qualitor - Usuário ""qualitor""","R Networks",Teampass,,qualitor,Qualitor!@#,"Usu?rio interno da inst?ncia \"MSSQLSERVER\" do MSSQL 2017 (server02), utilizado para conex?o da aplica??o Qualitor Web ao banco de dados do Qualitor. Para autentica??o ao banco de dados do Qualitor, ? necess?rio usar os usu?rios de dom?nio \"eduardo.mozart\", \"roberto.oliveira\" ou \"Administrator\"."


Steps to Reproduce:
1. I opened it with LibreOffice only with the "Separated by" 'Comma' selected. It loads and shows up into the right columns as expected. 

2. I select all cells and change the cell formating to "Text".

3. Go to File > Save as.

4. Selected "Text CSV" and enabled the checkbox to customize the CSV filter and I had setup the following settings:

Field delimiter: ;
Text delimiter: "
Quote all text cells: checked


Actual Results:
When exporting, the quotes next to "eduardo.mozart" was broken. It seems that the CSV filter didn't found the opening quote to filter it as expected and the following output was generated:

"MSSQL 2017 - Qualitor - Usu?rio ""qualitor""";"R Networks";"Teampass";;"qualitor";"Qualitor!@#";"Usu?rio interno da inst?ncia \""MSSQLSERVER\"" do MSSQL 2017 (server02), utilizado para conex?o da aplica??o Qualitor Web ao banco de dados do Qualitor. Para autentica??o ao banco de dados do Qualitor, ? necess?rio usar os usu?rios de dom?nio \""eduardo.mozart\";" \""roberto.oliveira\"" ou \""Administrator\"".""";

Please notice that there's a missing quote next to "eduardo.mozart" and it added a ";" at the wrong position, breaking the CSV file and make it impossible for our application to import it because the number of columns was wrong.

Expected Results:
I had to manually fix the file by appending one more quote and replacing the ";" character to ",":

"MSSQL 2017 - Qualitor - Usu?rio ""qualitor""";"R Networks";"Teampass";;"qualitor";"Qualitor!@#";"Usu?rio interno da inst?ncia \""MSSQLSERVER\"" do MSSQL 2017 (server02), utilizado para conex?o da aplica??o Qualitor Web ao banco de dados do Qualitor. Para autentica??o ao banco de dados do Qualitor, ? necess?rio usar os usu?rios de dom?nio \""eduardo.mozart\"", \""roberto.oliveira\"" ou \""Administrator\"".""";



Reproducible: Always


User Profile Reset: No

Additional Info:
I'm not sure if it's a bug, but my original file had 7 columns. When exporting it to CSV using LibreOffice, it added an extra ";" at the end of the line, effectively creating an extra column (the output from LibreOffice created a 8 columns CSV file). I had to remove this extra column (";" from the end of each line) manually after export because our application supports only CSV file lines with 7 columns.
Comment 1 Mike Kaganski 2023-01-17 06:56:20 UTC
This is not a bug.

(In reply to eduardomozart182 from comment #0)
> I'd imported a CSV file to LibreOffice with the following line:
> 
> "MSSQL 2017 - Qualitor - Usuário ""qualitor""","R
> Networks",Teampass,,qualitor,Qualitor!@#,"Usu?rio interno da inst?ncia
> \"MSSQLSERVER\" do MSSQL 2017 (server02), utilizado para conex?o da
> aplica??o Qualitor Web ao banco de dados do Qualitor. Para autentica??o ao
> banco de dados do Qualitor, ? necess?rio usar os usu?rios de dom?nio
> \"eduardo.mozart\", \"roberto.oliveira\" ou \"Administrator\"."
> 
> 
> Steps to Reproduce:
> 1. I opened it with LibreOffice only with the "Separated by" 'Comma'
> selected. It loads and shows up into the right columns as expected.

No, it doesn't load "as expected", because if you inspect it carefully, the following piece goes into column H (i.e., into the next cell):

 \"roberto.oliveira\" ou \"Administrator\"."

and the cell in column G ends with

\"eduardo.mozart\

And this is the correct behavior. If you look at the first field in the CSV, it looks like

"MSSQL 2017 - Qualitor - Usuário ""qualitor"""

and appears in Calc as

MSSQL 2017 - Qualitor - Usuário "qualitor"

The quotations work as expected, according to the CSV specification RFC 4180 [1]:

> 7.  If double-quotes are used to enclose fields, then a double-quote
>     appearing inside a field must be escaped by preceding it with
>     another double quote.  For example:
> 
>     "aaa","b""bb","ccc"

Note: to escape the double-quote character, you use *another double-quote character*!

Now look at the 7th field (as one *could* expect):

"Usu?rio interno da inst?ncia \"MSSQLSERVER\" do MSSQL 2017 (server02), utilizado para conex?o da aplica??o Qualitor Web ao banco de dados do Qualitor. Para autentica??o ao banco de dados do Qualitor, ? necess?rio usar os usu?rios de dom?nio \"eduardo.mozart\", \"roberto.oliveira\" ou \"Administrator\"."

It starts and ends with a double-quote; and all internal double-quotes *must* be escaped using another double-quote. But they are not. Instead, backslash is used for that - which is wrong. And when parsing such a field, Calc finds an *unescaped* double-quote, immediately followed by proper field delimiter (comma), which happens after eduardo.mozart\; Calc correctly ends the field here, and starts another field - which gets the rest.

And then, it outputs this to the file according to your specifications; the field delimiter changes to semicolon, and it's used to separate 7th and 8th fields.

[1] https://www.rfc-editor.org/rfc/rfc4180