In Calc I open a csv file which has these values: a,b,c,test,-10,11,23,-24 I open this file in calc, make no changes and save still as csv. The file now has these values: a,b,c,test,+AC0-10,11,23,+AC0-24
This means, that for export, you used UTF-7 (not UTF-8!). Check your export filter settings.
The file is downloaded from a mainstream bank website. The file displays as expected in notepad++. The csv displays as expected in libre calc. When saved, the file has these added characters. Is this expected behaviour? Not a great UX.
Well, I finally found the bug that discusses the real problem here. It is bug 150836, that is *correctly* called "CSV save-mode is different from the one used for opening" - even though it only shows one aspect of it, specifically the "save formula" mode. Generally, the encoding used on opening the file should be also pre-selected when saving - and *if* it's not the case, it's a bug. This looks very similar to bug 120574. But first, we need to be able to reproduce the problem. Can you provide a sample csv (yes, I mean a CSV file, not its text in comment 0 - because in the file, there will also be *encoding*, so that we know, that exactly this file opens on your system, but when you save it, the encoding changes). Then please provide full information from Help->About. Maybe we could reproduce this after that.
Created attachment 200370 [details] Text File Instructions to create the csv text file: 1. Open a notepad, Notepad++ or any text editor really. 2. type some text and numbers separated by a comma. Remember to include some negative numbers! 3. Save as csv. Instructions to recreate the "bug" 1. Now open the csv file created above in libre calc. 2. Save the file in libre calc. 3. Open the text file with a text editor. 4. What do you see?
(In reply to MartinP from comment #4) > Instructions to recreate the "bug" > 1. Now open the csv file created above in libre calc. > 2. Save the file in libre calc. > 3. Open the text file with a text editor. > 4. What do you see? I see a newline appeared after the line. But neither encoding is reported changed, nor there were any new characters. So there is a question, how come that you have UTF-7 set in your CSV export settings. Now please: 1. Open your csv file in libre calc. 2. File->Save As. 3. In the File Picker dialog, check the "Edit filter settings" checkbox, and press OK. 4. Confirm file format (Use Text CSV Format), if asked. 5. What do you see in the "Character set" field of the Export Text File dialog? I assume, you will see UTF-7 there; and I also assume, that you saw this dialog in the past at least once (it could be e.g. when you created a spreadsheet anew, and saved as CSV); and there, you chosen UTF-7 mistakenly, instead of UTF-8. If I'm right, than the incorrect encoding ("character set") there is the user error. But the real problem here is that it doesn't pick the encoding from the value that you used in the Import dialog, but uses the "last used" value from another time (which is, again, bug 150836).
*** Bug 166211 has been marked as a duplicate of this bug. ***
I still wonder how this error occurred. You attribute this to user error by my somehow specifying UTF-7, but I have some difficulty with that explanation (not the UTF-7 part). My problem was observed on TWO different machines.The second machine was prepped by removing the entire LibreOffice 24 package via Windows control panel. A fresh download of Libre 25 was performed and installed using the TYPICAL option. TWO fresh copies of the suspect data file were downloaded and testing was performed without any other intervening steps... ESPECIALLY NOT performing any changes in OPTIONS. Again, the "virgin" file was read correctly by my R program whereas the other,"Calc saved" file had the +AC0 encoding error. Again, this happened on BOTH my machines, independent of one another. I do not dispute your UTF-7 assessment... I just wonder how the same error could appear on BOTH machines? Possible answers are: 1) I really am an incompetent dolt who managed to make the same mistake on two different machines, despite the fact that changing a code page seems like a pretty involved, intentional process not prone to accidental revision. 2) Somehow Microsoft Control Panel uninstall process for v24 of Libre (which was working just fine for me) corrupted some residual file that went untouched during the uninstall process. When v25 installed, it picked-up on that changed value. 3) Assuming that v25 went through the entire cycle from Unit test -> System test -> Release testing (which I'm sure it did), there is the possibility that someone JUST PRIOR to the final build of the .exe image for distribution had changed the code page for whatever reason and THAT is what went out the door. Perhaps there are other scenarios which haven't immediately come to mind, but these 3 are certainly candidates. The fact that the error occurred on two different machines diminishes (though doesn't eliminate) the likelihood of Theory 1. The additional fact that my processing simply uses Calc as an intermediary step rather than as a self-contained solution might be a situation not considered in your Release Testing Script/Plan. I shall attempt to manually change the code page to see if that effects a change and success. I will report back if it doesn't.
I will check the encoded value of the "virgin" file as it came straight via download from another DB program. This file was not created in Calc originally, simply downloaded.
(In reply to dpkesling from comment #7) First of all: note that I don't make strong claims. I reset it to UNCONFIRMED, because I have no 100% evidence that it was a user error. If we find steps to reproduce on a clean machine, it would be a definite bug. > My problem was observed on TWO different machines.The second machine was > prepped by removing the entire LibreOffice 24 package via Windows control > panel. A fresh download of Libre 25 was performed and installed using the > TYPICAL option. Note that removal of the program using control panel, and installing of a new version, do nothing with the *user settings* - that aren't part of installation set, but are created by the (first-)running program, and are not removed by uninstallation. Thus, what you described says *nothing* about "cleanness" of the settings on that system. Of course, that also doesn't prove that there necessarily was the pre-existing problem there; and I even consider that strange, that you would make the same mistake on two different systems (in the imagined scenario that you also set the settings to UTF-7 before uninstalling the program)... Unless you cloned the settings somehow?
Latest info... I went to the "virgin" file and started to load it into Calc... The text IMPORT screen (grab attached) had UTF-7 auto selected. The image shows the screen AFTER I changed it to UTF-8. In made adjustments to the file, went to save it and when I looked at the File Options per your suggestion, it showed some Western-ISO thing (also attached). I left that as is.... and the file processed just fine. SO The problem you identified as UTF-7 was there, but as an artifact of the TEXT IMPORT process. Bottom line: I am back in business w/ v25. Final questions: Has Calc always spec'd UTF-8 or was this a recent change from UTF-7> And, does Calc autodetect that code page of the incoming file and set itself accordingly... or just run with the last format specified by the user? For my part, I am gonna go upstream and ask my DB Admin just what format e's spitting these files at me. Weird.
Created attachment 200396 [details] TEXT INPUT screen w/ UTF-8
Created attachment 200397 [details] TEXT OUTPUT screen
This bug is similar to one that I logged around the same time (https://bugs.documentfoundation.org/show_bug.cgi?id=166238). That one is also apparently a case of incorrect choice of character set. In 166238, Calc auto-selected "UTF-7" by default (without my awareness) on import, which resulted in Calc mangling the content of my CSV file. It is my contention that (1) Calc 25.2 is making an inappropriate choice in deciding what character set to use in the import (or export) dialog; (2) the character set choice was NOT based on my prior import/export history, because I have never used UTF-7 before; (3) most user don't know the difference between various character set encodings, and they are not knowledgeable enough to know that they have to explicitly choose UTF-8 among dozens of other choices; and (4) Calc just started defaulting to UTF-7 with version 25.x. After I installed 25.2, the default on the import dialog changed. IMO it should not default to UTF-7.