Bug 154131 - Add option to not detect numbers in scientific notations (like 1E1 or 5E2) at text paste / import
Summary: Add option to not detect numbers in scientific notations (like 1E1 or 5E2) at...
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Laurent Balland
URL:
Whiteboard: target:7.6.0 target:24.2.0 target:7.6...
Keywords:
Depends on:
Blocks: Number-Format Calc-Cells CSV-Dialog
  Show dependency treegraph
 
Reported: 2023-03-11 08:27 UTC by Heiko Tietze
Modified: 2023-07-25 19:45 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
screenshot (22.88 KB, image/png)
2023-06-01 19:57 UTC, BogdanB
Details
screenshot of text import at 66da786bdd4588b31755058acf46034c2056215c (10.95 KB, image/png)
2023-06-22 07:11 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Heiko Tietze 2023-03-11 08:27:02 UTC
I insert a large list of text from the clipboard and stumbled over the fact that content like 1E1 is converted to 10 or 5E2 to 50 (among other strings like CSE1, 1ECS etc.). Without sorting I wouldn't have realized this silent conversion.

Solution is to mark this column explicitly as text. However I doubt many users keep in mind to do so. We could a) always use Text, b) warn if a column contains variable content, c) just don't convert scientific number formats. None is really good, maybe c) with an option to switch it on.
Comment 1 Heiko Tietze 2023-03-11 08:28:01 UTC
The conversion also happens on every insertion- in this case easy to recognize, of course.
Comment 2 Laurent Balland 2023-03-11 09:42:40 UTC
I consider this bug is about check box "Detect special numbers". If NOT checked, scientific number should not be detected. Help [1] specifies that scientific number should only be detected if option is checked.

[1] https://help.libreoffice.org/7.6/en-US/text/shared/00/00000208.html?System=UNIX&DbPAR=CALC&HID=modules/scalc/ui/textimportcsv/box1#hd_id314847422

Steps to reproduce:
1. Copy text:
1E1	5E2
CSE1	DDE3
2. Paste in Calc
3. In "Text Import" dialog, uncheck "Detect special numbers". OK

Actual result:
1E1 and 5E2 strings are converted to 10 and 500

Expected result:
If "Detect special numbers" is unchecked, text should be preserved and not converted to scientific number.
Comment 3 V Stuart Foote 2023-03-11 15:52:02 UTC
Well the other facet is hat the paste is being done into a column set with Number General, so Numbers are *expected* to be recognized coming into the cell.

If working with alphanumeric text you don't want the treated as number (1E1 -> 10, 5E2 -> 500) set the sheet column format to be 'Text' on import.

I don't think we can say the 'Detect special numbers' is at fault here. 1E1 and 5E2 are correctly picked up a numbers. The special numbers just renders them to scientific notation.

IMHO => NAB

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: a179f6c91692076e7e17babf4890638caa398384
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 4 V Stuart Foote 2023-03-11 17:06:30 UTC
We have it documented that the "Detect special numbers" checkbox is supposed to reject scientific notation formatted numbers and treat it as text[1] [2].  But reality is the exponential notation of 1E1 or SE2 has already been detected in sc editshell as a number. Likewise fully formatted SE of 1.00E+01 and 5.00E+02 are detected as numbers.

So either the documentation needs to be corrected (my preference), or the editshell needs to be tweaked to honor the dialog "Detect special numbers" setting. With detection checked off, the numbers are formatted general numbers, when detected special just the rendered number format code is changed. The editshell numeric value remains intact.  I think handling the format is correct/preferred for the detection. IMHO the documentation is wrong.

Try this set with and without the "Detect special numbers" checkbox:

1E1	5E2
CSE1	DDE3
1.00E+01	5.00E+02
2E0	3E3
1E4	1E0
1.2E4	1.004E0
2.0056E1	2.0056E0 

then examine the cell content via the format cell. And of course if you need it/expect it to be formatted as text you need to indicate such (on the sheet column before import, or within the import dialog).  So IMHO => NAB but the documentation needs to be corrected.

=-ref-=

[1] https://help.libreoffice.org/7.6/en-US/text/shared/00/00000208.html?System=UNIX&DbPAR=CALC&HID=modules/scalc/ui/textimportcsv/box1#hd_id314847422

[2] https://help.libreoffice.org/7.6/en-US/text/shared/00/00000208.html?&DbPAR=CALC&System=WIN

"Detect special numbers
When this option is enabled, Calc will automatically detect all number formats, including special number formats such as dates, time, and scientific notation.

The selected language influences how such special numbers are detected, since different languages and regions many have different conventions for such special numbers.

When this option is disabled, Calc will detect and convert only decimal numbers. The rest, including numbers formatted in scientific notation, will be imported as text. A decimal number string can have digits 0-9, thousands separators, and a decimal separator. Thousands separators and decimal separators may vary with the selected language and region."
Comment 5 Laurent Balland 2023-03-12 09:04:35 UTC
(In reply to V Stuart Foote from comment #4)
> So either the documentation needs to be corrected (my preference), or the
> editshell needs to be tweaked to honor the dialog "Detect special numbers"
> setting. 
I do agree with your analysis, but not the conclusion: the behavior should be corrected, not help. The option should be enough to decide if "2E1" is a scientific number or a text, like "3/10" is a date or a text with or without the option.
Comment 6 Eyal Rozenberg 2023-03-12 21:31:15 UTC
> I insert a large list of text from the clipboard and stumbled over the fact that content like 1E1 is converted to 10 or 5E2 to 50

When you paste a bunch of text from the clipboard, you get the Text Import dialog, where you decide how you want the pasting to work. Did you click Ok without checking, and are filing this bug about the default options? If not - then... INVALID?

Also, I think I agree with Stuart. Excel/Calc users "know" that when that regular paste does all sorts of weird conversions, just like typing into a cell: Scientific Notation, Dates, numbers starting with 0 which lose the 0, and maybe other stuff. We "know" that to get exactly the pasted text, we need to play with the options.
Comment 7 Eike Rathke 2023-03-13 16:39:09 UTC
This for years works as is. Spreadsheets accept numbers, surprise. If the behaviour is to be changed then only with an additional option "Detect scientific numbers" that when "Detect special numbers" is off can be toggled and if that is on is enabled and deactivated.
Comment 8 V Stuart Foote 2023-03-13 17:24:41 UTC
(In reply to Eike Rathke from comment #7)
Agree, but what of the current documentation in Help? How should we rephrase that to correctly explain behavior?
Comment 9 Laurent Balland 2023-03-13 20:25:27 UTC
(In reply to Eike Rathke from comment #7)
> This for years works as is.
The behavior changed with branch 3.6. Last version I found with the behavior described in help is LibreOffice 3.5.7.2 Version ID : 3215f89-f603614-ab984f2-7348103-1225a5b which was released more than 10 years ago ;-)
Comment 10 Laurent Balland 2023-03-23 19:24:56 UTC
I propose this change to help:
https://gerrit.libreoffice.org/c/help/+/149467
Comment 11 Heiko Tietze 2023-03-24 08:30:40 UTC
The documentation is good when you realize the issue; improvements welcome of course.

But I assume many users are not aware of this "conversion". Eike's proposal of an option (would enhance it to "Detect scientific numbers like 5E2") sounds like a good solution to point people at the topic.
Comment 12 Commit Notification 2023-03-27 18:18:48 UTC
Laurent Balland committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/e24ed3bc78751e931c81a98a542af71aa4a0f5b4

tdf#154131 Import text in Calc: Scientific numbers
Comment 13 Stéphane Guillou (stragu) 2023-05-11 08:52:51 UTC
(In reply to Heiko Tietze from comment #11)
> Eike's proposal
> of an option (would enhance it to "Detect scientific numbers like 5E2")
> sounds like a good solution to point people at the topic.

OK, I changed the summary and set it to enhancement, is that OK?
Comment 14 Laurent Balland 2023-05-20 10:22:18 UTC
(In reply to Stéphane Guillou (stragu) from comment #13)
> (In reply to Heiko Tietze from comment #11)
> > Eike's proposal
> > of an option (would enhance it to "Detect scientific numbers like 5E2")
> > sounds like a good solution to point people at the topic.
> 
> OK, I changed the summary and set it to enhancement, is that OK?

Sounds good to me
Comment 15 Commit Notification 2023-05-26 21:57:20 UTC
Laurent Balland committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2b3b4026a5c6a1b92c0b29b94dd62a4c3c7b9fad

tdf#154131 Add Detect scientific number option

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Eike Rathke 2023-05-26 22:04:04 UTC
Please add the new feature and filter flag to the release notes
https://wiki.documentfoundation.org/ReleaseNotes/7.6#Calc
and the filter flag to
helpcontent2/source/text/shared/guide/csv_params.xhp
as well.
Comment 17 Stéphane Guillou (stragu) 2023-05-26 22:18:23 UTC
Laurent, could we have the string "Detect scientific notation" instead of "Detect scientific numbers" in a follow-up commit? Again, "scientific number" sounds awkward to me, is barely used, and I'd expect users to report it as incorrect.
Comment 18 Laurent Balland 2023-05-27 08:13:31 UTC
(In reply to Stéphane Guillou (stragu) from comment #17)
> Laurent, could we have the string "Detect scientific notation" instead of
> "Detect scientific numbers" in a follow-up commit? Again, "scientific
> number" sounds awkward to me, is barely used, and I'd expect users to report
> it as incorrect.

I will make the follow-up next week

@Eike: I will update release notes and help as well
Comment 19 BogdanB 2023-05-30 05:10:41 UTC
Working well with this examples

1E1	5E2
CSE1	DDE3
1.00E+01	5.00E+02
2E0	3E3
1E4	1E0
1.2E4	1.004E0
2.0056E1	2.0056E0 

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: fc4f7db59152f606b6aa88cf32197700959d0f8b
CPU threads: 16; OS: Linux 5.19; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded
Comment 20 Commit Notification 2023-06-01 07:27:10 UTC
Laurent Balland committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/811e67eb45b1fc97d8c8cc9509acb33267a51aa4

tdf#154131 Update UI

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 BogdanB 2023-06-01 17:53:19 UTC
Is there still another place to change the UI? In the latest master, updated now, I can see "Detect special numbers" when I paste in Calc.

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: d7d2b172065f90aa2f61c0216f3722e868ae76a1
CPU threads: 16; OS: Linux 5.19; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded
Comment 22 Laurent Balland 2023-06-01 19:24:47 UTC
(In reply to BogdanB from comment #21)
> Is there still another place to change the UI? In the latest master, updated
> now, I can see "Detect special numbers" when I paste in Calc.
No, there is no other place. The option "Detect special numbers" is a different option. 
I need to update screenshot on Release Notes.
https://wiki.documentfoundation.org/ReleaseNotes/7.6#Calc

Help update is under review.
https://gerrit.libreoffice.org/c/help/+/152407
Comment 23 BogdanB 2023-06-01 19:57:25 UTC
Created attachment 187651 [details]
screenshot

This bug is about text Paste and Import. In your screenshot is import, I attach now a paste screenshot.
Comment 24 Laurent Balland 2023-06-01 20:09:00 UTC
(In reply to BogdanB from comment #23)
> Created attachment 187651 [details]
> screenshot
> 
> This bug is about text Paste and Import. In your screenshot is import, I
> attach now a paste screenshot.

Sorry, I only used "Paste unformatted text". I never used direct "Paste", which cannot detect column separation.

But I will continue with this dialog also.
Comment 25 Laurent Balland 2023-06-05 06:19:19 UTC
(In reply to Laurent Balland from comment #24)
> But I will continue with this dialog also.
This dialog is used for HTML import.
https://gerrit.libreoffice.org/c/core/+/152591
Comment 26 Commit Notification 2023-06-13 09:41:41 UTC
Laurent Balland committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/a8a5edeadd66edfc0fb0747ba7f1251da1b2597c

tdf#154131 Detect numbers in scientfic notation
Comment 27 Commit Notification 2023-06-15 16:31:01 UTC
Laurent Balland committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/help/commit/ef0e2a4bf333ba01d2e3fd81ad9e4b14fb205e4d

tdf#154131 Detect numbers in scientfic notation
Comment 28 Stéphane Guillou (stragu) 2023-06-22 07:11:33 UTC
Created attachment 188050 [details]
screenshot of text import at 66da786bdd4588b31755058acf46034c2056215c

Tested with the following data:

1E1	5E2
CSE1	DDE3
1.00E+01	5.00E+02
2E0	3E3
1E4	1E0
1.2E4	1.004E0
2.0056E1	2.0056E0 
02/03/23	01:02:03

Pasted into Calc with Paste Special > Use text import dialog, in version:

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 66da786bdd4588b31755058acf46034c2056215c
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Works well, but one issue I see is that when only "detect scientific notation" is used, the data is recognised as a number but the formatting is _not_ in scientific notation (whereas it is when using "detect special numbers").

Could this be tweaked too?
Comment 29 madisonabubakar 2023-07-03 01:39:38 UTC Comment hidden (spam)
Comment 30 madisonabubakar 2023-07-03 01:40:21 UTC Comment hidden (spam)
Comment 31 Commit Notification 2023-07-25 19:45:05 UTC
Laurent Balland committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5ae709d8519dd6d0de265d516c6158ccbdf4882e

follow tdf#154131 Treat also HMTL import

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.