Bug 168639 - text import dialog -- ENHANCEMENT add regExp separator such as numeric and dates
Summary: text import dialog -- ENHANCEMENT add regExp separator such as numeric and dates
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
25.8.1.1 release
Hardware: All Linux (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Dialog
  Show dependency treegraph
 
Reported: 2025-10-01 11:12 UTC by Phil
Modified: 2025-10-01 20:43 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Text Import dialog LO Calc future release (38.33 KB, image/webp)
2025-10-01 11:12 UTC, Phil
Details
Text Import dialog LO Calc 258 (33.86 KB, image/webp)
2025-10-01 11:15 UTC, Phil
Details
Text Import example the table you wish to copy past into Calc (11.27 KB, image/webp)
2025-10-01 11:19 UTC, Phil
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phil 2025-10-01 11:12:18 UTC
Created attachment 203076 [details]
Text Import dialog LO Calc future release

LO Calc 25.8.1
concern: text import dialog

Hi,
this a general feature enhancements, which would slash down other bugs reports.

Challenge: convert text to csv

It's common to copy tables from text source (html, pdf, docx, etc) and the text import dialog lacks solutions.

1/ TEXT TO COPY
E.g.copy & paste (CTRL+SHIFT+V) to open text import dialog the following:
NVIDIA 4,249.99 5.44 Info Tech
MICROSOFT CORP 3,577.70 4.58 Info Tech
APPLE 3,467.20 4.44 Info Tech
AMAZON.COM 2,188.03 2.80 Cons Discr
META PLATFORMS A 1,603.83 2.05 Comm Srvcs
BROADCOM 1,328.83 1.70 Info Tech
ALPHABET A 1,239.14 1.59 Comm Srvcs
ALPHABET C 1,049.09 1.34 Comm Srvcs
TESLA 967.84 1.24 Cons Discr
JPMORGAN CHASE & CO 837.67 1.07 Financials


2/ With detect you'll get sthg as, which is not what you want, namely:
NVIDIA;4,249.99;5.44;Info;Tech;;
MICROSOFT;CORP;3,577.70;4.58;Info;Tech;
APPLE;3,467.20;4.44;Info;Tech;;
AMAZON.COM;2,188.03;2.80;Cons;Discr;;
META;PLATFORMS;A;1,603.83;2.05;Comm;Srvcs
BROADCOM;1,328.83;1.70;Info;Tech;;
ALPHABET;A;1,239.14;1.59;Comm;Srvcs;
ALPHABET;C;1,049.09;1.34;Comm;Srvcs;
TESLA;967.84;1.24;Cons;Discr;;
JPMORGAN;CHASE;&;CO;837.67;1.07;Financials


3/ Your aim(that you can not presently attain due to fact that numerical value are not treated as separators):
NVIDIA;4249.99;5.44;Info Tech
MICROSOFT CORP;3577.70;4.58;Info Tech
APPLE;3467.20;4.44;Info Tech
AMAZON.COM;2188.03;2.80;Cons Discr
META PLATFORMS A;1603.83;2.05;Comm Srvcs
BROADCOM;1328.83;1.70;InfoTech
ALPHABET A;1239.14;1.59;Comm Srvcs
ALPHABET C;1049.09;1.34;Comm Srvcs
TESLA;967.84;1.24;Cons Discr
JPMORGAN CHASE & CO;837.67;1.07;Financials


4/ SOLUTION:
Well LO Calc has already what is needed, but needs to brought into the "text import dialog".
Attached screenshots are:
1/ text import dialog as current 25.8.1,
2/ text import dialog a proposed dirty mock modification for future release.

In this mock dialog, I propose to had as separator new fields being regExp with a pull down list with choices such numeric, dates, etc
This is similar to what already exists in Calc, when one change the format of cells(CTRL+1) and choose tab "Numbers" in which you can choose for instance dates as YYYY-MM-DD (iso 8601) or whatever.


Additionnaly,
I propose to have an indication of the consistancy of the output by having a first field the number of cols LO Calc detects for each row.
So in the case of
NVIDIA;4,249.99;5.44;Info;Tech;;
MICROSOFT;CORP;3,577.70;4.58;Info;Tech;
APPLE;3,467.20;4.44;Info;Tech;;
AMAZON.COM;2,188.03;2.80;Cons;Discr;;
META;PLATFORMS;A;1,603.83;2.05;Comm;Srvcs
BROADCOM;1,328.83;1.70;Info;Tech;;
ALPHABET;A;1,239.14;1.59;Comm;Srvcs;
ALPHABET;C;1,049.09;1.34;Comm;Srvcs;
TESLA;967.84;1.24;Cons;Discr;;
JPMORGAN;CHASE;&;CO;837.67;1.07;Financials

The output would be:
4;NVIDIA;4249,99;5,44;Info;Tech;;
5;MICROSOFT;CORP;3577,7;4,58;Info;Tech;
4;APPLE;3467,2;4,44;Info;Tech;;
4;AMAZON.COM;2188,03;2,8;Cons;Discr;;
6;META;PLATFORMS;A;1603,83;2,05;Comm;Srvcs
4;BROADCOM;1328,83;1,7;Info;Tech;;
5;ALPHABET;A;1239,14;1,59;Comm;Srvcs;
5;ALPHABET;C;1049,09;1,34;Comm;Srvcs;
4;TESLA;967,84;1,24;Cons;Discr;;
7;JPMORGAN;CHASE;&;CO;837,67;1,07;Financials


Thank you
Comment 1 Phil 2025-10-01 11:15:36 UTC
Created attachment 203077 [details]
Text Import dialog LO Calc 258
Comment 2 Phil 2025-10-01 11:19:32 UTC
Created attachment 203078 [details]
Text Import example the table you wish to copy past into Calc

Example table you wish to copy past into Calc
Comment 3 Phil 2025-10-01 11:33:41 UTC
Oups, made a mistake in the output of adding a first field as the number of cols in the row, here's the correct :

5;NVIDIA;4249,99;5,44;Info;Tech;;
6;MICROSOFT;CORP;3577,7;4,58;Info;Tech;
5;APPLE;3467,2;4,44;Info;Tech;;
5;AMAZON.COM;2188,03;2,8;Cons;Discr;;
7;META;PLATFORMS;A;1603,83;2,05;Comm;Srvcs
5;BROADCOM;1328,83;1,7;Info;Tech;;
6;ALPHABET;A;1239,14;1,59;Comm;Srvcs;
6;ALPHABET;C;1049,09;1,34;Comm;Srvcs;
5;TESLA;967,84;1,24;Cons;Discr;;
7;JPMORGAN;CHASE;&;CO;837,67;1,07;Financials
Comment 4 V Stuart Foote 2025-10-01 14:54:22 UTC
Doing a \d digit match at word bounds \b (with or without decimal and thousands separator) as a column separator might be feasible, but IMHO dates are probably too complicated in general to be able to parse into columns. Although simple ISO 8601 formats might also be feasible.

But isn't this rather a niche user request?  Seems sed and awk, or perl, or python provide means to stream convert into a meaningful CSV format, external to LibreOffice, for import to calc or writer table. And, most sources will provide export choices to delimit both fields and text strings, suitable for ingestion without additional formatting.

Other than convenience, no real justification to shoehorn this into the Text Import dialog as a means to describe complex Field Delimiter(s).

IMHO interesting if a dev has interest to take on the refactoring of our Text Import dialog to provide delimiter for numbers between word bounds, and maybe also ISO dates.

Otherwise => WF
Comment 5 Phil 2025-10-01 20:43:10 UTC
Quickly looking at some bugs from the metabug 109239, I think: 

https://bugs.documentfoundation.org/show_bug.cgi?id=103597
"I would rephrase the idea to allow regular expressions as delimiters. Doing so adds a lot of flexibility while the approach is "well known"."
This proposed solution appears to also solve this.

https://bugs.documentfoundation.org/show_bug.cgi?id=122422
Add regular expression filter option to Text Import window
This proposed solution appears to also solve this.

https://bugs.documentfoundation.org/show_bug.cgi?id=114199
This proposed solution appears to also solve this.

and I believe many other bugs can be crossed from metabug of 109239.
Wouldn't you like to make the work of developers and the community much easier?