Bug 150264 - FORMATTING: UI improvements to add automatic data validation of dates
Summary: FORMATTING: UI improvements to add automatic data validation of dates
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.5.0.0 alpha0+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks: Number-Format Calc-UX
  Show dependency treegraph
 
Reported: 2022-08-04 14:07 UTC by C.Rogers
Modified: 2022-12-18 10:36 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
UI mockup of suggested checkbox addition to the Format Cells dialog. (105.47 KB, image/png)
2022-08-04 14:07 UTC, C.Rogers
Details
UI mockup of suggested checkbox addition to the Format Cells dialog. (160.99 KB, image/png)
2022-08-04 23:38 UTC, C.Rogers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description C.Rogers 2022-08-04 14:07:21 UTC
Created attachment 181585 [details]
UI mockup of suggested checkbox addition to the Format Cells dialog.

Given the proper context, it's just nice when software does some things automatically for you. Take for example formatting text data cells to date format. It would just be amazing beyond all reason if there were a checkbox to include a data validation pass which checks the contents of the cells being reformatted and, providing the contents match the given Format Code, Calc should then convert the contents to match the data type of the cell.

I've attached a mockup of my initial idea for this - Placing it in this dialogue under the Format Code makes it both a convenient and intuitive solution, which negates the need for a separate dialogue.

I think it's fine if it's off by default. It's discoverable enough that the user has a clear choice. I suspect with regard to dates, most users will check that box, but I'm open to also discussing alternative solutions, which don't require installing plugins, or using error-prone reg-ex or find and replace methods. :)
Comment 1 C.Rogers 2022-08-04 14:09:39 UTC
Had an IRC conversation with mikekaganski who suggested I make this enhancement suggestion. I'm really excited to be helping LO which I use constantly for everything. :)
Comment 2 Mike Kaganski 2022-08-04 15:32:33 UTC
(A nitpick: attachment 181585 [details] seems to be just a screenshot of the current Format Cells dialog - had you possibly forgotten to save the edits before posting?)

The idea, as I understand it, is to provide some control on the Number format page, which would help users that need to convert their data to the newly chosen format category.

There might be several possible directions here; I can think of these two:

* Have a button that would close this dialog, and open another "data conversion" dialog (it would need to be created; current Text to Columns can't work with several columns, and is too generic, incorporating splitting of data to several columns);
* Have a checkbox that would cause application of the new format to also attempt conversion of text (possibly with a test that the text fits the wanted category - date, fraction, etc.).
Comment 3 C.Rogers 2022-08-04 23:38:41 UTC
Created attachment 181623 [details]
UI mockup of suggested checkbox addition to the Format Cells dialog.
Comment 4 C.Rogers 2022-08-04 23:47:14 UTC
Thanks Mike!
Indeed, I somehow managed to export the screenshot without the modification. Please see the newly uploaded attachment.

As you can see, I've opted for #2 of your two scenarios which happens to be a bit less work and places the option where it's most useful in the existing Dialogue. While it might be nice to have a separate dialogue for validating all of the existing sheet, this UX issue is really just to offer a seamless solution for converting the data alongside converting the cell data type.

I'm willing to help with both however. just let me know what you'd like to see next. :)

Cheers!
Comment 5 Heiko Tietze 2022-08-08 09:40:08 UTC
To my understanding you want to enter "8.8." and ensure it's formatted as date (input is automatically assigned to the right number format). But if you miss the trailing dot, for example, it's currently accepted as text and you want some feedback if the cell is defined as date format (and if the to-be-introduced checkbox is on).

Assuming the feedback is some kind of "You can't do this" keeping you in edit mode seems not to be convenient. And a conversion by all means rather not feasible (and error-prone; 8/8 could be a date but also an equation; 1:10 a heading, the time 12/24h, or just a typo, etc.).

Consider also what happens when you paste many data.
Comment 6 C.Rogers 2022-08-08 10:29:33 UTC
I believe that strings appear on the left of the cell, and numbers of the correct format on the right. This is plenty of feed back that the string is unchanged to the number format. Also, the string values will not sort properly with the rest of the date column, which is even more feedback that something is missing. I don't think there needs to be any extra messages to this effect.
Comment 7 Heiko Tietze 2022-08-08 10:36:16 UTC
(In reply to C.Rogers from comment #6)
> I believe that strings appear on the left of the cell...

And we have View > Value Highlighting. But what should happen when the entered value is a mismatch to the expected format? In other words what exactly means "Validate dates in selected cells and convert to match format code" given that we have the automatic assignment of number formats and a proper feedback too.
Comment 8 C.Rogers 2022-08-08 11:07:01 UTC
When the user is simply entering values, they do not see the Format Cells dialogue. This checkbox only for when the user is using that dialogue. Nothing else is meant to change. Indeed, it should not. This feature is not needed when editing single cells. It's for when changing the cell data type of many cells at once. It does not affect data entry directly in the spreadsheet at all.

It's a single pass at the time of cell formatting.
Comment 9 C.Rogers 2022-08-08 11:19:38 UTC
It's functionally equivalent to what Calc does when importing a CSV and the user chooses "Date" as a data type for a column. Calc runs through and validates all the cells in that column and converts. This feature just allows this same functionality after import, in case the user missed this functionality, and for new data cut and pasted to multiple cells as text. It also helps in the case that the user accidentally chose the wrong date format, and wants to convert one cell type to another after import. Sometimes lots of work has already been done to other columns, so it would be quite convenient to have this option to validate on conversion of cell type on-the-fly.
Comment 10 Heiko Tietze 2022-08-08 11:31:38 UTC
(In reply to C.Rogers from comment #8)
> It's for when changing the cell data type of many cells at once.

Still don't understand what functionality the checkbox adds. Maybe someone smarter than me can comment.
Comment 11 Mike Kaganski 2022-08-08 11:52:30 UTC
The UX problem is that people tend to confuse (mix) formatting to cell data type.

In Calc, a cell can contain a text, a number, or a formula. The actual type of data in the cell is defined at entry type, and then never changes until another entry happens.

Also there is a cell format, including a "number format string"; and the dialog categorized different pre-configured (and user-defined) format strings into categories, like "Number", "Percent", "Date", "Time", "Text", etc.

Usually inexperienced users don't really understand the fundamental difference between data and its representation, and that it's perfectly legal and normal that e.g. a cell formatted "Date" may contain a text that *looks like* date, but is not a date. People also may get large amount of data (e.g., from copy-paste), that arrives as text into Calc cells, which users then want to convert to a number (e.g., to dates). And the users tend to expect that applying a Date *format* to the cells, they will get those cells contain dates, when they had strings previously, like "01/02/03".

It is really large proportion of questions e.g. on Ask site, asking why Calc doesn't work (just because it doesn't match the mentioned expectation). Answering this: one of the main "whys" is because data conversion is always a destructive process (you will loose leading/trailing zeros, number of digits in year, spaces, digits that don't fit into machine number representation, lots of things when convert a string into a matching number; you will make the resulting number be shown using numeric format, potentially breaking your titles) - and formatting is usually done on large ranges (like whole columns), potentially breaking data somewhere out of your current view, if the re-formatting would be allowed to change the data itself.

But given how strong this expectation is in users, it might be reasonable to have some *explicitly opt-in* way in the format dialog to refer to the data conversion. This is what this proposal is about: either allow users to use this dialog itself to *also* convert (matching) data, or to provide a link to the conversion functionality, so that users would be able to discover it from where they look for it.
Comment 12 C.Rogers 2022-08-08 12:00:56 UTC
Exactly this ^

Though functionally, it's to prevent the user having to manually remove the ' from every cell in order to convert the cell contents to the cell data type when there is already a 1 to 1 match between the two (see attached video), as shown in this video clip:

https://www.dropbox.com/s/dg084yyiqxepdu8/manually_changing_cell_contents.mp4?dl=1
Comment 13 QA Administrators 2022-08-09 03:39:29 UTC Comment hidden (obsolete)
Comment 14 Heiko Tietze 2022-08-15 12:21:56 UTC
I'm having a hard time to imagine the actual conversion. For example 04/07/2022 could be MM/DD/YYYY or DD/MM/YYYY, 4/7 a date or a fraction, 4.7 a number or a date with missing dot etc. And having such an option that forces input into a specific format might lead to the same trouble for users, just from the opposite side.

Don't know to what extend we recognize values but given the input of

test
4/7
04/07/2022
4.7

I expect A1 to be text, B1 too, C1 to become a date if the conversion can be done with the locale setting, and D1 a number (converted even when locale is using a comma). Should happen on loading, pasting, and direct input. And overwritten in the next run.

Why do we need an option?
Comment 15 Mike Kaganski 2022-08-15 12:30:35 UTC
(In reply to Heiko Tietze from comment #14)

What does that mean? I fail to follow what confused you.
Did you get confused how that data could be inserted with the "unexpected" types?

It may come from a multitude of sources, e.g. CSV imported using wrong locale (including being imported long ago, so you are not in a position to just re-import using correct settings); or it could had been prepared on a different system using different settings; or it could be a data copy-pasted from HTML... Lots of situations.

> And having such an option that forces input into a specific format

I didn't understand that. What is "an option that forces input into a specific format"? Was such a thing discussed here?
Comment 16 Heiko Tietze 2022-08-15 13:02:55 UTC
(In reply to Mike Kaganski from comment #15)
> > And having such an option that forces input into a specific format
> 
> I didn't understand that. What is "an option that forces input into a
> specific format"? Was such a thing discussed here?

To my understanding the request is about a checkbox that converts input by all means into the given format. "Validate dates..." is not generic enough and validation sounds like feedback. So I imagine something like "[ ] Convert data to the specified format". And the "by all means" is what bothers me. What I'd expect, again, is an automatic but rather conservative detection of the input format. Without options.
Comment 17 Mike Kaganski 2022-08-15 13:21:58 UTC
(In reply to Heiko Tietze from comment #16)
> To my understanding the request is about a checkbox that converts input by
> all means into the given format.

Ah :) No, and please note that your wording reflects the usual misconception - so let's together please avoid this confusion: not "convert to format", but "convert to a specific data type". Again: this is very important difference.

> "Validate dates..." is not generic enough

I agree - the request (and its title) wording is too restrictive, and is dictated by the user's use case; my explanation in comment 2 and comment 11 does not limit he types.

> and validation sounds like feedback.

I do not quite get you; in my understanding, that was meant by OP to describe "validate if the text in cells matches the format before converting the data type" - so e.g. if user selected "DD/MM/YYYY" format, then the conversion would *first* check if the string in the cell matches this format (it has three parts separated by slashes; the first part is two-character, having a number in 1-31 range; second is two-character, having a number 1-12; and third is four-character, consisting on digits), and only then convert. I myself do not have an opinion how much such a check could be useful/implementable.

> So I imagine something like "[ ] Convert data to the specified format".

Yes, matches the second * in comment 2.

> And the "by all means" is what bothers me.

It *seems* to me that the first (and before my response, the only) mention of "by all means" in this issue was yours ;)

> What I'd expect, again, is an automatic but rather conservative
> detection of the input format. Without options.

No options was proposed up to now; the only "optional" thing was to perform (or not perform) the conversion itself.
Comment 18 C.Rogers 2022-08-15 13:23:48 UTC
Yes, this is precisely what I would expect. No feedback, but a very conservative one to one conversion. I also agree with changing to "Convert data to specified format". I agree that validation sounds like it would have more feedback associated with the process.
Comment 19 Heiko Tietze 2022-08-15 13:35:34 UTC
Looks we are on the same page. Expert knowledge needed now regarding the implementability.
Comment 20 C.Rogers 2022-12-18 10:36:05 UTC
Hey folks. Did you need anymore mockups from me on this? Can I bribe someone to work on it with chocolate or graphics services as a bounty? :)

Please let me know if I can help push this forward. I think it's a pretty big usability improvement and would be a great advantage over MS Office and other Office suites.

Thanks!