Bug 142536 - CSV import changes data on import
Summary: CSV import changes data on import
Status: RESOLVED DUPLICATE of bug 114878
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Depends on:
Reported: 2021-05-28 08:08 UTC by Martin Häcker
Modified: 2021-05-28 08:28 UTC (History)
0 users

See Also:
Crash report or crash signature:
Regression By:

import dialog with preview (718.26 KB, image/png)
2021-05-28 08:08 UTC, Martin Häcker
imported (1.48 MB, image/png)
2021-05-28 08:09 UTC, Martin Häcker

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Häcker 2021-05-28 08:08:04 UTC
Importing a CSV-File with data that could be interpreted as formulas executes them after import. FILEOPEN IMPORT VIEWING CSV

Steps to Reproduce:
1. Create data only css file

-- snip --
-- snap --

2. Open with  German version of Calc (Screenshot 'import dialog with preview.png'). Observe that the preview renders all the formulas as _DATA_ as it should be.
3. Click 'import'
4. Observe Screenshot 'imported.png'

Actual Results:
The two fields are not rendered as previewed, instead they are assumed to be formulas and are executed. Luckily there seems to be a security safeguard that at least blocks the http call from immediate execution. However even this block is removed by a single click on the notice at the top of the window.

Expected Results:
I have imported a CSV file (which is a data only format), watched the file beforehand in a text editor to see what I will be getting, watched the preview for correctness and am still not getting the import that was previewed. This is highly surprising and als a huge enabler for a full class of security problems.

If I want the data to be interpreted and changed by Libre Office Calc, that needs to be a separate (off by default) check box that warns about the problems and security risks this poses - especially if the preview is not complete and therefore does not allow me to assess what checking this box would exactly do.

Several problems I see here:

a) The preview should match the actual imported data
b) It is highly surprising that importing a data only format will suddenly interpret that data and not display what is in the file. This is especially problematic if a web application exports data, that contains user controlled inputs to exchange it to other applications and it gets imported in Calc at some stage. The only workaround available is to know at export time, where the file will be imported in later, so the export can be sanitised for the importing application. This is highly unpractical and has a high likelihood of data loss / unintended data changes if the file is imported in the wrong application.
c) This is also highly surprising when one investigates the RFC for CSV: <https://datatracker.ietf.org/doc/html/rfc4180> which states:

   Security considerations:

      CSV files contain passive text data that should not pose any
      risks.  However, it is possible in theory that malicious binary
      data may be included in order to exploit potential buffer overruns
      in the program processing CSV data.  Additionally, private data
      may be shared via this format (which of course applies to any text

This has many and quite surprising security considerations - so much so, that OWASP maintains it as it's own category of security problem: <https://owasp.org/www-community/attacks/CSV_Injection>.

I learned of this because the German Corona Tracing App Luca was attacked through this vector - but also users of web applications I maintain are attackable by this problem.

I understand that this Is probably a long running convention for CSV import and has an aspect of compatibility with other spreadsheet applications. However this is a problematic behaviour for which there is no workaround when importing data into Calc, and there needs to be a strategy for fixing - but at least allowing a workaround for this.

I would like to suggest going at this in a multi step process - quite possibly stretched out over a long period. Maybe even 5-10 years - but of course I would like a faster transition period.

My suggestion is:

1. Add a setting on import that at least allows forcing Libre Office Calc to interpret all imported data literally so there is at least a workaround available immediately.
2. After some time, start warning on the import preview if the imported data contains anything that LibreOffice would like to interpret (At least formulas, but probably also data that could be auto formatted). This should explain the problem and/or link to a website that explains the problem and the security concerns.
3. After some more time, switch on this option by default and instead warn if the imported data contains interpretable data. Maybe show a preview of what the interpretation would change to allow the user to understand what this would do.

That way impact on existing users of that feature can be minimised, while still there is at least an immediate workaround available. The time bought by this measures can then be used to create the other suggested import features to make the transition to not interpreting imported CSV data by default safe for everyone.

Reproducible: Always

User Profile Reset: No

OpenGL enabled: Yes

Additional Info:
Version: / LibreOffice Community
Build ID: 47f78053abe362b9384784d31a6e56f8511eb1c1
CPU threads: 8; OS: Mac OS X 10.16; UI render: GL; VCL: osx
Locale: de-DE (de_DE.UTF-8); UI: de-DE
Calc: threaded
Comment 1 Martin Häcker 2021-05-28 08:08:42 UTC
Created attachment 172399 [details]
import dialog with preview
Comment 2 Martin Häcker 2021-05-28 08:09:06 UTC
Created attachment 172400 [details]
Comment 3 Mike Kaganski 2021-05-28 08:28:13 UTC

*** This bug has been marked as a duplicate of bug 114878 ***