Bug 139414 - Calc exports CSV dates as 11/10/09 Millennium Bug
Summary: Calc exports CSV dates as 11/10/09 Millennium Bug
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.4.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Export
  Show dependency treegraph
 
Reported: 2021-01-04 22:44 UTC by Jonny Grant
Modified: 2022-04-23 14:55 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Screencast of test with LO 7.0.4.2 (156.79 KB, video/mp4)
2021-01-05 06:04 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonny Grant 2021-01-04 22:44:14 UTC
Can Calc export dates using the sensible ISO 8601 format?
When opening these CSV files containing 11/11/11 it is anyone's guess if the reading software will get it right.

This bug is easy to reproduce

1. Type:  11 Oct 2009
2. Observe the cell shows 11/10/09
3. Export as CSV, observe the file contains: 11/10/09


What I expected was the sensible format:  2009-10-11 the format is well defined, YYYY-MM-DD
Comment 1 Leyan 2021-01-04 23:14:43 UTC
You can simply change the format of the cell to display YYYY-MM-DD before saving as csv, I don't think it is a bug.
Comment 2 Jonny Grant 2021-01-04 23:57:24 UTC
(In reply to Leyan from comment #1)
> You can simply change the format of the cell to display YYYY-MM-DD before
> saving as csv, I don't think it is a bug.

Every user has to manually set a format that won't damage the data?

additional steps
4. Open the CSV, edit, and save.
5. Observe the manual formatting to YYYY-MM-DD has been lost.
Comment 3 Stéphane Guillou (stragu) 2021-01-05 06:04:53 UTC
Created attachment 168692 [details]
Screencast of test with LO 7.0.4.2

Does the formatting depend on the used locale?

In my test (see video), I get the date recognised, and the value shown in the spreadsheet is 11/10/09, but the actual value in the field is 11/10/2009.

When saving as CSV (File > Save as...), keeping "Save cell content as shown" unticked, I get the full "11/10/2009" in the resulting CSV.

If you tick "Save cell content as shown", you will only have "11/10/09" in the resulting CSV (as expected).

I agree with you though, the format should be unambiguous, and an ISO standard would be ideal, but the reality is that a lot (/most of?) the users are used to reading the format DD/MM/YYYY, so having it as a default might make sense.

But if saving to CSV _does_ reduce 2009 to 09 (with "Save cell content as shown" not ticked), that _is_ a case of data loss, but I can't reproduce on:

Version: 7.0.4.2
Build ID: 00(Build:2)
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Ubuntu package version: 1:7.0.4_rc2-0ubuntu0.18.04.2
Calc: threaded

Could you check that "Save cell content as shown" setting, and try on the latest version of the LO 7.0 ?
Comment 4 Jonny Grant 2021-01-05 10:08:26 UTC
(In reply to stragu from comment #3)
> Created attachment 168692 [details]
> Screencast of test with LO 7.0.4.2
> 
> Does the formatting depend on the used locale?
> 
> In my test (see video), I get the date recognised, and the value shown in
> the spreadsheet is 11/10/09, but the actual value in the field is 11/10/2009.
> 
> When saving as CSV (File > Save as...), keeping "Save cell content as shown"
> unticked, I get the full "11/10/2009" in the resulting CSV.
> 
> If you tick "Save cell content as shown", you will only have "11/10/09" in
> the resulting CSV (as expected).
> 
> I agree with you though, the format should be unambiguous, and an ISO
> standard would be ideal, but the reality is that a lot (/most of?) the users
> are used to reading the format DD/MM/YYYY, so having it as a default might
> make sense.
> 
> But if saving to CSV _does_ reduce 2009 to 09 (with "Save cell content as
> shown" not ticked), that _is_ a case of data loss, but I can't reproduce on:
> 
> Version: 7.0.4.2
> Build ID: 00(Build:2)
> CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
> Locale: en-AU (en_AU.UTF-8); UI: en-US
> Ubuntu package version: 1:7.0.4_rc2-0ubuntu0.18.04.2
> Calc: threaded
> 
> Could you check that "Save cell content as shown" setting, and try on the
> latest version of the LO 7.0 ?

My apologies I am not using a newer release of LibreOffice.

It sounds like "Save cell content as shown" would work, is that ticked by default?

Although, it's quite a manual process to require us to set a sensible format before saving dates. This is basically a millennium bug still existing in Calc.
Comment 5 Jean-Baptiste Faure 2021-05-10 20:39:23 UTC
(In reply to Jonny Grant from comment #4)
> [...]
> Although, it's quite a manual process to require us to set a sensible format
> before saving dates. This is basically a millennium bug still existing in
> Calc.

I disagree, it is up to the user to choose the date format suitable for his purpose. The year coded with 2 digits may be ambiguous or not, depending the meaning of the data. If your data is about genealogy using 2 digits for the year is a very bad idea. If your data are about your bank account, there is not ambiguity about a 2 digits year.

I think there is another problem with your request. It is like if you wanted that the date value reveals its date format. That is not possible with CSV format.
For example, its clear that the format of 2021-01-13 is YYYY-MM-DD, but for 2021-01-12 you can't decide between YYYY-MM-DD and YYYY-DD-MM and there is nothing in the CSV format that gives you a mean to decide. The only mean is to put yourself the date format information in the header of the date column.

Calc can't do that work for you because it knows nothing about the meaning of the data.

So for me there is no bug here and I close this bug report as Not-A-Bug. Please feel free to reopen if you disagree.

Best regards. JBF
Comment 6 Jonny Grant 2021-05-11 17:18:33 UTC
Hello
Thank you for your reply.
Politely I would say, maybe you have some misunderstanding - This bug is about CSV export from Calc. It is not about loading CSV files. You can see this in comment #1.

It's clearly inappropriate to export as a two digit year format. As a qualified software engineer it's astonishing to read your comments.

It's vital this defect is fixed because it damages the quality of the dates when they are exported, so I am re-opening.


Unfortunately Calc has many bugs, and none are fixed. If they can't be resolved, maybe it's time for users to migrate to a more modern spreadsheet.
Comment 7 Jean-Baptiste Faure 2021-05-11 20:54:56 UTC
Never confirmed -> back to unconfirmed.
Comment 8 Buovjaga 2022-04-19 07:06:20 UTC
(In reply to Jonny Grant from comment #6)
> Hello
> Thank you for your reply.
> Politely I would say, maybe you have some misunderstanding - This bug is
> about CSV export from Calc. It is not about loading CSV files. You can see
> this in comment #1.

It is about loading as well. Like Jean-Baptiste said, you can't automatically distinguish between YYYY-MM-DD and YYYY-DD-MM dates coming from a CSV file, so nothing in the export will guarantee a sensible import.

"Save cell content as shown" is already in 6.4. You can access the filter options by ticking the box "Edit filter settings" in the file saving dialog before clicking Save.
Comment 9 Jonny Grant 2022-04-22 15:41:16 UTC
That seems to work, some of the time

If I do 'File'->'Export' it doesn't have that option.

Another time I ended up with "Untitled 1.csv.html"

I'm using 6.4.7.2
Comment 10 Buovjaga 2022-04-22 16:06:00 UTC
(In reply to Jonny Grant from comment #9)
> That seems to work, some of the time
> 
> If I do 'File'->'Export' it doesn't have that option.
> 
> Another time I ended up with "Untitled 1.csv.html"
> 
> I'm using 6.4.7.2

That's because you picked XHTML from File - Export. CSV files are saved from Save As.
Comment 11 Jonny Grant 2022-04-22 23:01:53 UTC
Hi Buovjaga
If I type "Untitled 1.csv" in the box, I expect it to save as CSV... As is customary in all software. Feels like a bug. It should say "Error cannot Export as CSV" right?
Comment 12 Buovjaga 2022-04-23 06:35:17 UTC
(In reply to Jonny Grant from comment #11)
> Hi Buovjaga
> If I type "Untitled 1.csv" in the box, I expect it to save as CSV... As is
> customary in all software. Feels like a bug. It should say "Error cannot
> Export as CSV" right?

I discussed this in the chat and one comment was "CSV as extension is not a file type, it's just three random characters that happen to coincide with something else".

The thing is that dots are allowed in file names and it would be rather restrictive to error out on any "dot something" in the file name that doesn't match the currently selected export filter.
Comment 13 Mike Kaganski 2022-04-23 07:10:37 UTC
FTR:

Typing file.csv in the MS Excel's Save As dialog (note how it's the *same* dialog that *actually allows* to save as CSV, unlike what comment 9 did!) having XLSX file type selected, I get a 'file.csv.xlsx' without warning. If I use double quotes to force the exact filename, and type "file.csv", I also got no warning, but I still get the XLSX with "wrong" extension.

If I use the same procedure in Excel's Export dialog, I have similar results: no warnings, always get the file type selected in relevant dropdown, but when I use quoted filename, I get "csv" silently replaced with "pdf".

In any case, the other major spreadsheet software vendor doesn't get convinced by Jonny Grant's "If I type "Untitled 1.csv" in the box, I expect it to save as CSV".
Comment 14 Mike Kaganski 2022-04-23 07:18:42 UTC
(In reply to Jean-Baptiste Faure from comment #5)
> For example, its clear that the format of 2021-01-13 is YYYY-MM-DD, but for
> 2021-01-12 you can't decide between YYYY-MM-DD and YYYY-DD-MM and there is
> nothing in the CSV format that gives you a mean to decide.

(In reply to Buovjaga from comment #8)
> It is about loading as well. Like Jean-Baptiste said, you can't
> automatically distinguish between YYYY-MM-DD and YYYY-DD-MM dates coming
> from a CSV file

Just wanted to mention, that unlike any other date layout, YYYY-XX-YY is the unambiguous one (YYYY-MM-DD), defined in ISO 8601; and no software would ever try to treat it another way. If someone would like to use YYYY-DD-MM, they would be on their own.
Comment 15 Jonny Grant 2022-04-23 14:55:07 UTC
Mike
I wish opening a file with YYYY-MM-DD formatting would be retained when it is saved. Likewise for YYYY-MMM-DD

I recall I always still see two digit year columns in exports unless I manually do stuff. Exporting years correctly and fully should be the default behaviour.