Bug 95629 - CSV "Save as..." format is wrong when exporting from XLSB with a final \r in a cell value
Summary: CSV "Save as..." format is wrong when exporting from XLSB with a final \r in ...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:5.2.0 target:5.1.0.1 target:5...
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-06 10:51 UTC by libreofficecsvbug
Modified: 2016-10-25 19:11 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Perl script to produce a bug-exhibiting small example (406 bytes, text/plain)
2015-11-06 10:53 UTC, libreofficecsvbug
Details
Sample CSV with field with new lines. (26 bytes, text/plain)
2015-11-09 15:07 UTC, m.a.riosv
Details
XLS produced by Perl. Convert to XLSB using Microsoft Office, then to CSV using LibreOffice. (6.00 KB, application/x-ole-storage)
2015-11-12 15:19 UTC, libreofficecsvbug
Details
xlsb file (8.06 KB, application/vnd.ms-excel.sheet.binary.macroEnabled.12)
2015-11-14 17:59 UTC, raal
Details
XLSB file produced with Microsoft Office 2010 from the XLS produced with Perl. Rezipped after removing names from it. (56.69 KB, application/vnd.ms-excel.sheet.binary.macroenabled.12)
2015-11-16 09:41 UTC, libreofficecsvbug
Details

Note You need to log in before you can comment on or make changes to this bug.
Description libreofficecsvbug 2015-11-06 10:51:46 UTC
Open Office Calc gets the CSV format wrong when some cells have a particular value (a final \r character). It then cannot load back the file as it was from the exported CSV.

Steps to reproduce

1/ make an XLSB file where a cell has a final \r in its contents (ASCII character 0x0D, CR)
You may use the attached Perl script for that; fewer rows/cols should suffice.
It will produce a .xls file you can then open in Microsoft Office 2010 to
save it as an .xlsb file
Note: this bug does not show up if working from the XLS format.

2/ save the XSLB file as CSV from LibreOffice Calc (using the Debian package 1:4.3.3-2+deb8u1)
I used options ";" for separator and '"' for text

Looking the .csv resulting file (for example with VIM), one notices the absence of "..." double quotes around the value of the special cell (but Microsoft Office does when asked to save this kind of file and produces a valid CSV).

Indeed, Libre Office Calc can not load back the file as it was: it introduces
a row break at the position of this unprotected \r!
Comment 1 libreofficecsvbug 2015-11-06 10:53:41 UTC
Created attachment 120320 [details]
Perl script to produce a bug-exhibiting small example

Perl script mentioned in the bug description
Comment 2 m.a.riosv 2015-11-07 00:31:06 UTC
Hi @libreofficecsvbug, thanks for reporting.

Sorry, but why LibreOffice must change the cell content on saving?, and what If it was what you want to do?
Comment 3 libreofficecsvbug 2015-11-09 10:07:10 UTC
You misread my report and did not try to reproduce it.

I am complaining LibreOffice wrongly saves the cell content like this:

      ...;aaa;bbb;ccc;FINAL_NEWLINE\r;ddd;eee;...

instead of the correct (which is what MS Office does):

      ...;aaa;bbb;ccc;"FINAL_NEWLINE\r";ddd;eee;...

As a result the CSV does not match the spreadsheet and LibreOffice (and other tools) cannot read it back to fit the original data.
Comment 4 m.a.riosv 2015-11-09 15:07:14 UTC
Created attachment 120419 [details]
Sample CSV with field with new lines.

Attached a sample file created with LibreOffice.

The only that I have done is when saving, click on Edit-filter-settings and select "Quote all text cell".
Comment 5 libreofficecsvbug 2015-11-10 09:31:52 UTC
Did you reproduce the bug?
Did you work from XSLB format? You only give a CSV file.
Why is this not the default setting? What good does not quoting all text cells do?

Having a spreadsheet unable to reload the data it just saved looks like a bug to me.
Comment 6 Eike Rathke 2015-11-12 11:49:15 UTC
@libreofficecsvbug:
Could you please attach the resulting .xlsb file and not just the perl script?
Comment 7 Eike Rathke 2015-11-12 11:57:04 UTC
(In reply to m.a.riosv from comment #2)
> Sorry, but why LibreOffice must change the cell content on saving?, and what
> If it was what you want to do?

It is part of the CSV "spec" that if the field content embeds a newline the entire field content is to be quoted.
Comment 8 libreofficecsvbug 2015-11-12 15:17:45 UTC
XLSB: I do not trust Microsoft Office to not embed company/personal info in the file I would attach.

I attach the .xls produced by Perl (with a 10x10 grid and a newline at 5,5).

This bug happens with xlsb not xlsx so you need to convert xls to xlsb with Microsoft Office, then xlsb to csv with LibreOffice.

Getting back to the first reactions I had on this:
Quoting all text fields is not satisfactory to me:
1/ it is not the default setting
2/ it wastes space: no need to go 

      "aaa";"bbb";"FINAL NEWLINE\r";"ccc"

   when 

      aaa;bbb;"FINAL NEWLINE\r";ccc

  would suffice (and that is what Microsoft Office does)

3/ it still does not explain why LibreOffice is inconsistent (this works with xlsx not xlsb)
Comment 9 libreofficecsvbug 2015-11-12 15:19:22 UTC
Created attachment 120504 [details]
XLS produced by Perl. Convert to XLSB using Microsoft Office, then to CSV using LibreOffice.
Comment 10 Eike Rathke 2015-11-12 18:43:05 UTC
So you expect everyone who'd want to work on this to have a running copy of MS-Office around. Great :-/
Comment 11 raal 2015-11-14 17:59:55 UTC
Created attachment 120543 [details]
xlsb file
Comment 12 libreofficecsvbug 2015-11-16 09:39:55 UTC
A friend sent me an XLSB from my XLS published here, but it does not reproduce the bug.
Did that friend have another copy of MS Office? I don't know.

I just checked the XLSB I had reproduced the bug with. I removed my name from it then rezipped it, and could reproduce the bug. Please find it attached.
Comment 13 libreofficecsvbug 2015-11-16 09:41:31 UTC
Created attachment 120569 [details]
XLSB file produced with Microsoft Office 2010 from the XLS produced with Perl. Rezipped after removing names from it.
Comment 14 libreofficecsvbug 2015-11-16 09:44:11 UTC
I just checked raal's file and can confirm it exhibits the bug too.
Comment 15 libreofficecsvbug 2015-11-20 10:08:22 UTC
Can you confirm this bug now or do you need more input?
Comment 16 Eike Rathke 2015-11-24 20:47:23 UTC
The .xlsb import doesn't create an EditCell as would be necessary for a multi-line cell. The problem would not occur if 0x0A line feed instead of 0x0D carriage return was used.
Comment 17 Commit Notification 2015-11-24 21:12:28 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=22ebafe8897239696f46df6f093054d16285004a

Resolves: tdf#95629 import also 0x0D as rich EditCell, not only 0x0A

It will be available in 5.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2015-11-24 23:58:47 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7289a140fc68dc898ba2b2357cc960968195f236

Revert "Resolves: tdf#95629 import also 0x0D as rich EditCell, not only 0x0A"

It will be available in 5.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Eike Rathke 2015-11-25 00:04:44 UTC
That apparently was a wrong fix.

I wonder if the same happens with the .xlsx format, that binary .xlsb annoys me anyway.
Comment 20 libreofficecsvbug 2015-11-26 12:41:29 UTC
Apparently Microsoft Office 2010 replaces \r in cells with the "_x000D_\n" string, even when the \r is not final in the cell, when asked to export the document to XLSX. If needed I can perform various tests with multi-line cells.

This would make it a bug(?) of Office 2010, which does not impact the fact Office 2010 makes a better job than LibreOffice when dealing with XSLB-with-\r
Comment 21 Commit Notification 2015-12-01 20:29:35 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=129935443cfd9378e1263489fc4bf47aee1f1a46

Resolves: tdf#95629 quote CSV also if cell contains embedded '\r' CR

It will be available in 5.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2015-12-01 20:39:09 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=63ada88a0fe28440c021141da911f69f7394250d&h=libreoffice-5-1

Resolves: tdf#95629 quote CSV also if cell contains embedded '\r' CR

It will be available in 5.1.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Eike Rathke 2015-12-01 20:45:21 UTC
Pending review https://gerrit.libreoffice.org/20340 for 5-0
Comment 24 Commit Notification 2015-12-03 12:24:10 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-5-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f7d37735bbe1ba2aa11fd5d0b4abee0fe088471a&h=libreoffice-5-0

Resolves: tdf#95629 quote CSV also if cell contains embedded '\r' CR

It will be available in 5.0.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.