Bug 53449 - FILESAVE: With CSV format, "Save" converts field delimiter to TAB
Summary: FILESAVE: With CSV format, "Save" converts field delimiter to TAB
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: All All
: high normal
Assignee: Eike Rathke
URL:
Whiteboard: BSA target:4.2.0
Keywords: needsDevEval
: 43352 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-08-13 17:20 UTC by FM
Modified: 2022-12-01 10:02 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
csv file with ; (semicolon) as delimiter (45 bytes, text/csv)
2022-02-23 09:38 UTC, Stefan M
Details

Note You need to log in before you can comment on or make changes to this bug.
Description FM 2012-08-13 17:20:03 UTC
Problem description: When using "Save" (Crtl+S), Libreoffice converts CSV file originally created with default CSV field and text delimiters (, and " respectively) gets converted to TAB for field delimiter and no text delimiter.

Steps to reproduce:
1. Create CSV file using default CSV delimiters
2. Close and re-open the file
3. Edit the file
4. Use File > Save or Crtl+S to save the file

Current behavior: the delimiters are changed

Expected behavior: the delimiters are conserved as in the original file

Platform (if different from the browser): Ubuntu 12.04
              
Browser: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Comment 1 Romain 2012-10-31 10:58:00 UTC
I face the same issue with 3.5.4.2.
Any known work around?
Comment 2 Joel Madero 2012-11-01 21:38:03 UTC
Confirmed. 

Bodhi Linux
Libo: 3.6.1.2

Workaround: File - Save As again, then you can save with comma or whatever you'd like

Prioritized:

Minor: Doesn't prevent professional quality work, CSV still works fine, just the separation of columns is changed from , to a tab

Low: Not many users affected, easy work around, CSV still works
Comment 3 bob 2013-02-07 15:52:13 UTC
Still able to reproduce the bug with 4.0.0.2rc. I personally think it's not such a minor issue. I process CSV files with my scripts, now I have to search and replace those damn tabs to commas because LibreOffice messed up my files.

Instead of a workaround I would prefer a fix. It's a shame such a simple file format cannot be handled correctly with LibreOffice.

P.S.: why tabs?? it is called a *comma* separated values file for a reason. Why is the default separator set to tab when saving CSV? It really puzzles me.
Comment 4 Joel Madero 2013-02-09 17:58:16 UTC
Version is the oldest version that we see the issue, we use comments to say that we've confirmed on a newer release. Changing version back.

Also, I heavily disagree that this is a normal bug...normal bug means you are UNABLE to make high quality work, because there is a workaround, you are not prevented from doing so. That being said, I will leave it as normal.
Comment 5 Peteris 2013-02-24 00:51:13 UTC
Still broken as on 4.0.0.3 / Win. 
Workaround exists, but still it's a bug - since it corrupts data silently without warnings; and you have to know about it (be hurt once) before you can apply the workaround.
Comment 6 Joel Madero 2013-02-25 04:36:13 UTC
Three separate users who have identified this as a problem - upping priority to medium.

Also I don't think this would be very difficult to fix, adding:

ProposedEasyHack
Comment 7 Muthu 2013-03-01 11:53:32 UTC
Seems to be fixed in Master?
Comment 8 Tyler 2013-03-07 17:38:42 UTC
I'd like to attempt to fix this problem, seems easy enough for me. :-)
Comment 9 Eike Rathke 2013-03-07 19:18:08 UTC
I can't reproduce this in master or 4.0.1
Comment 10 Tyler 2013-03-07 19:35:46 UTC
Appearently when I read through the code and looked through the history, there has been no changes to impex.cxx file that manage CSV exporting. I am also unable to reproduce the bug on 4.0.1 as I'm on Linux, but I'm unsure if this apply to Windows as well.

http://cgit.freedesktop.org/libreoffice/core/commit/sc/source/ui/docshell/impex.cxx?h=libreoffice-4-0-1
Comment 11 FM 2013-03-08 16:04:03 UTC
It seems that I can't reproduce it either (even on files that used to be problematic). This is on Ubuntu 12.10 with Libreoffice 4.0.1.2. Marking as resolved.
Comment 12 FM 2013-03-25 22:58:49 UTC
This bug is still present in 4.0.1.2 contrary to what I reported earlier.

I also figured out how to reproduce this bug more precisely:

1. Create a CSV file using default CSV delimiters
2. Close the file
3. Re-open the file. At the Text import dialog, if "Separated by Tab" is checked alongside of "comma" (comma being the actual delimiter), after editing and saving the file, the delimiter is converted to TAB.

In other words, it seems that the text import option override the actual field delimiter being used by the file.
Comment 13 Eike Rathke 2013-03-25 23:42:01 UTC
Indeed, that's reproducible.

@Tyler:
Do you want to work on it?

It seems the remembered export options' delimiter is overridden by the import options' ones even if those are more than one and then probably just the first encountered is used that happens to be the tab character.

Note that there is no such thing as "the actual field delimiter being used by the file", every delimiter that is checked in the import options is treated as such when encountered in the file, just that in this case there is only a ',' comma delimiter present.
Comment 14 Friedmann Bruno 2013-08-21 11:35:39 UTC
Confirmed on Windows 7 64 bits in 
Stable production for enterprise 4.0.4.2 and several Linux ( Ubuntu 12.10, 13.04 and openSUSE 12.3 and next 13.1)

If one day a user check the import TAB separator option, any file he could save (without going to avdvanced) will have the original separator (,;:whatever) replaced by TAB.

This cause irremediable lost of quality data, expected by enterprise users.
For example, this silent change of separator can cause unexpected crash in a transform chain or corrupted data.

As it was working before, it has to be considered as a bug + regression.

Could someone raise the priority, and also start to fix it.
Comment 15 Joel Madero 2013-08-21 16:57:56 UTC
Please don't update version and please read all the comments as I've already explained this.

Also - sure you can find someone to fix it, the code is there, our volunteers are busy working on a lot of things - if you need this one fixed all the code is Libre (meaning open for you to explore). Patches are welcome
Comment 16 Owen Genat (retired) 2013-08-25 10:11:20 UTC
Can this bug be closed as a duplicate of bug #43352 ? They would appear to report the same issue. This one is marked as ProposedEasyHack while the other bug is marked as a regression, target:3.5.0, and is filed against v3.4.3. Both have been closed and re-opened at some point.
Comment 17 Eike Rathke 2013-08-26 09:40:37 UTC
*** Bug 43352 has been marked as a duplicate of this bug. ***
Comment 18 Eike Rathke 2013-08-26 09:42:39 UTC
Marked the other one as duplicate because this one here has a better description of what is going wrong when, see comment 12
Comment 19 Eike Rathke 2013-08-26 13:33:55 UTC
Taking this.
Comment 20 Eike Rathke 2013-08-26 14:21:10 UTC
What happens is that from the list of separators specified during import the first encountered is used, which happens to be Tab (or semicolon if checked) in this case. We can't do much else than applying some heuristic what separator to pick, i.e. weigh the separators present in the order comma, tab, semicolon, space and other.

If you know the separator used in the file select only that during import to be sure.
Comment 21 Commit Notification 2013-08-26 16:24:07 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5af6437f6b602773fb76dca76be1fc079d93c922

resolved fdo#53449 weight given separators to pick one for output



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 22 Robinson Tryon (qubit) 2015-12-18 10:16:26 UTC
Migrating Whiteboard tags to Keywords: (ProposedEasyHack -> needsDevEval)
[NinjaEdit]
Comment 23 Stefan M 2022-02-23 09:38:08 UTC
Created attachment 178468 [details]
csv file with ; (semicolon) as delimiter

The bug has resurfaced in libreoffice-7.2.5.2. 
As stated in the original description, the delimiter gets changed to <TAB> when it was originally something different. Example file attached. 
 
Just open the the attached file with ; (semicolon) as a delimiter and with or without a change just save it. Delimiter will by be changed to <TAB>.
Comment 24 Eike Rathke 2022-02-23 19:51:42 UTC
You probably have *both* Tab and Semicolon separators checked during import, of which for export the Tab gets precedence. The weighted export order for multiple separators used during import is Comma, Tab, Semicolon, Space, Other; that order because there are CSV (Comma Separated Values) an TSV (Tab Separated Values) file formats, the Semicolon is a variant of CSV and thus has a lower precedence. Best check *only* the to be used separator during import.
Comment 25 Stefan M 2022-02-24 16:12:29 UTC
First of all, this behaviour was not the case for several years/versions in the past. It reappeared sometime around Version 7.0.x (at that point I just waited for the a new version to solve the issue) and disappeared with some or the other more recent version and now it came back with Version 7.2.x.

But most of all, I do not see why Calc has to change the delimiter in any way, once the file was read in correctly. It should just stick with what it found and should not change it when the file is written. CSV is an exchange format, where the other side sometimes expects a certain delimiter. This behaviour disrupts the exchange in an unexpected way.

A workaround for me is to use "Save as.." and enable the filter settings for the csv-format (but this is cumbersome in case you handle a lot files). Once you save the file, it keeps the semicolon as delimiter until you re-open the file, no matter what other delimiters are selected for reading. So internally the delimiter can obviously be a property of the file/data already, the new version seems to just ignore it for the first time it writes it or it does not use the delimiter it found for reading and instead uses a default.
Comment 26 Eike Rathke 2022-02-25 13:22:10 UTC
Opening a file with only Semicolon as separator activated *does not* change it upon save. If multiple separators were activated Calc *has* to choose *one* for saving, there can be *only one* separator, and as explained if Tab happened to be among the multiple separators that Tab takes precedence over Semicolon. This is the case since the implementation that *fixed* this bug in 2013 and the behaviour did not change since then. Again, if during file import only the Semicolon was selected as separator then Semicolon will also be used for export.

Also, please don't fiddle with the Version field, that denotes when a bug was first encountered.