Open Office Calc gets the CSV format wrong when some cells have a particular value (a final \r character). It then cannot load back the file as it was from the exported CSV. Steps to reproduce 1/ make an XLSB file where a cell has a final \r in its contents (ASCII character 0x0D, CR) You may use the attached Perl script for that; fewer rows/cols should suffice. It will produce a .xls file you can then open in Microsoft Office 2010 to save it as an .xlsb file Note: this bug does not show up if working from the XLS format. 2/ save the XSLB file as CSV from LibreOffice Calc (using the Debian package 1:4.3.3-2+deb8u1) I used options ";" for separator and '"' for text Looking the .csv resulting file (for example with VIM), one notices the absence of "..." double quotes around the value of the special cell (but Microsoft Office does when asked to save this kind of file and produces a valid CSV). Indeed, Libre Office Calc can not load back the file as it was: it introduces a row break at the position of this unprotected \r!
Created attachment 120320 [details] Perl script to produce a bug-exhibiting small example Perl script mentioned in the bug description
Hi @libreofficecsvbug, thanks for reporting. Sorry, but why LibreOffice must change the cell content on saving?, and what If it was what you want to do?
You misread my report and did not try to reproduce it. I am complaining LibreOffice wrongly saves the cell content like this: ...;aaa;bbb;ccc;FINAL_NEWLINE\r;ddd;eee;... instead of the correct (which is what MS Office does): ...;aaa;bbb;ccc;"FINAL_NEWLINE\r";ddd;eee;... As a result the CSV does not match the spreadsheet and LibreOffice (and other tools) cannot read it back to fit the original data.
Created attachment 120419 [details] Sample CSV with field with new lines. Attached a sample file created with LibreOffice. The only that I have done is when saving, click on Edit-filter-settings and select "Quote all text cell".
Did you reproduce the bug? Did you work from XSLB format? You only give a CSV file. Why is this not the default setting? What good does not quoting all text cells do? Having a spreadsheet unable to reload the data it just saved looks like a bug to me.
@libreofficecsvbug: Could you please attach the resulting .xlsb file and not just the perl script?
(In reply to m.a.riosv from comment #2) > Sorry, but why LibreOffice must change the cell content on saving?, and what > If it was what you want to do? It is part of the CSV "spec" that if the field content embeds a newline the entire field content is to be quoted.
XLSB: I do not trust Microsoft Office to not embed company/personal info in the file I would attach. I attach the .xls produced by Perl (with a 10x10 grid and a newline at 5,5). This bug happens with xlsb not xlsx so you need to convert xls to xlsb with Microsoft Office, then xlsb to csv with LibreOffice. Getting back to the first reactions I had on this: Quoting all text fields is not satisfactory to me: 1/ it is not the default setting 2/ it wastes space: no need to go "aaa";"bbb";"FINAL NEWLINE\r";"ccc" when aaa;bbb;"FINAL NEWLINE\r";ccc would suffice (and that is what Microsoft Office does) 3/ it still does not explain why LibreOffice is inconsistent (this works with xlsx not xlsb)
Created attachment 120504 [details] XLS produced by Perl. Convert to XLSB using Microsoft Office, then to CSV using LibreOffice.
So you expect everyone who'd want to work on this to have a running copy of MS-Office around. Great :-/
Created attachment 120543 [details] xlsb file
A friend sent me an XLSB from my XLS published here, but it does not reproduce the bug. Did that friend have another copy of MS Office? I don't know. I just checked the XLSB I had reproduced the bug with. I removed my name from it then rezipped it, and could reproduce the bug. Please find it attached.
Created attachment 120569 [details] XLSB file produced with Microsoft Office 2010 from the XLS produced with Perl. Rezipped after removing names from it.
I just checked raal's file and can confirm it exhibits the bug too.
Can you confirm this bug now or do you need more input?
The .xlsb import doesn't create an EditCell as would be necessary for a multi-line cell. The problem would not occur if 0x0A line feed instead of 0x0D carriage return was used.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=22ebafe8897239696f46df6f093054d16285004a Resolves: tdf#95629 import also 0x0D as rich EditCell, not only 0x0A It will be available in 5.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=7289a140fc68dc898ba2b2357cc960968195f236 Revert "Resolves: tdf#95629 import also 0x0D as rich EditCell, not only 0x0A" It will be available in 5.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
That apparently was a wrong fix. I wonder if the same happens with the .xlsx format, that binary .xlsb annoys me anyway.
Apparently Microsoft Office 2010 replaces \r in cells with the "_x000D_\n" string, even when the \r is not final in the cell, when asked to export the document to XLSX. If needed I can perform various tests with multi-line cells. This would make it a bug(?) of Office 2010, which does not impact the fact Office 2010 makes a better job than LibreOffice when dealing with XSLB-with-\r
Eike Rathke committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=129935443cfd9378e1263489fc4bf47aee1f1a46 Resolves: tdf#95629 quote CSV also if cell contains embedded '\r' CR It will be available in 5.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-5-1": http://cgit.freedesktop.org/libreoffice/core/commit/?id=63ada88a0fe28440c021141da911f69f7394250d&h=libreoffice-5-1 Resolves: tdf#95629 quote CSV also if cell contains embedded '\r' CR It will be available in 5.1.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Pending review https://gerrit.libreoffice.org/20340 for 5-0
Eike Rathke committed a patch related to this issue. It has been pushed to "libreoffice-5-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=f7d37735bbe1ba2aa11fd5d0b4abee0fe088471a&h=libreoffice-5-0 Resolves: tdf#95629 quote CSV also if cell contains embedded '\r' CR It will be available in 5.0.5. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.