Bug Hunting Session
Bug 114200 - FILEOPEN: "Text Import" dialog: option to trim whitespace
Summary: FILEOPEN: "Text Import" dialog: option to trim whitespace
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.4.3.2 release
Hardware: All All
: medium enhancement
Assignee: Manuj Vashist
URL:
Whiteboard: target:6.1.0
Keywords: difficultyBeginner, easyHack, needsDevEval, skillCpp
Depends on:
Blocks:
 
Reported: 2017-12-01 22:23 UTC by clemty
Modified: 2018-03-31 11:23 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description clemty 2017-12-01 22:23:59 UTC
Description:
Pasting data with spaces adjacent to the field separator always imports the whitespace; this may not be desired in all cases

Steps to Reproduce:
1. Paste data in format "this is column 1 : this is column 2" into calc
2. In the 'Text import' dialog, select select ":" as separator
3. Click ok

Actual Results:  
column 1 contains "this is column 1 " (space at the end), column 2 contains " this is column 2" (space at the beginning)

Expected Results:
Have an option to trim whitespace from the resulting columns, to get "this is column 1" and "this is column 2"


Reproducible: Always


User Profile Reset: No



Additional Info:


User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
Comment 1 Buovjaga 2017-12-03 16:59:10 UTC
It is always good to ask UX before introducing new options, so I'm doing it.
One might argue that the input data should be trimmed before LibreOffice enters the picture.

Note for testers: the dialog is the same, if you open a CSV file. If you want to test via paste from Writer, paste as Unformatted text to Calc.
Comment 2 Heiko Tietze 2017-12-04 09:42:10 UTC
I'm split between voting against this enhancement as we provide plenty of trim functions afterwards and to agree with the common use case. If such an option is introduced the checkbox 'Trim whitespaces' could be added to 'other options'. 

Would say if a dev wants to do the work  it's a nice easyhack.
Comment 3 Manuj Vashist 2017-12-05 23:51:22 UTC
Hello,
I am new to Libre office can you please provide code pointers and how
can I proceed on this bug.
thank you.
Comment 4 Heiko Tietze 2017-12-06 09:34:09 UTC
(In reply to Manuj Vashist from comment #3)
> ...please provide code pointers and how can I proceed on this bug.

You could start at sc/uiconfig/scalc/ui/textimportcsv.ui and sc/source/ui/dbgui/scuiasciiopt.cxx but I have no idea where the actual import is done. Maybe you find a good keyword for use at https://opengrok.libreoffice.org/search?project=core (or just ask the devs on IRC (https://wiki.documentfoundation.org/Development for more info).
Comment 5 Aron Budea 2017-12-07 09:42:53 UTC
I don't know much about this, and don't know how to repro the paste part myself, but if it's the same as CSV handling, then the parsing code is here:

https://opengrok.libreoffice.org/xref/core/sc/source/ui/docshell/impex.cxx#367
bool ScImportExport::ImportStream(...)

...and even further in ExtText2Doc(...) called from there.
The options from the dialog are available in pExtOptions in that class (which is of type ScAsciiOptions). So I'd assume the UI has to be extended with a checkbox, then the option has to be captured into a newly added bool field of ScAsciiOptions through the class Heiko mentioned, and finally taken into account in ExtText2Doc(...) or somewhere around there.

I hope that helps, good luck!
Comment 6 Manuj Vashist 2017-12-10 06:23:16 UTC
I have done as what you said but I am missing something and can't figure it out why it is not showing expected output.
can you please review my code and tell whats wrong.
Comment 7 Manuj Vashist 2017-12-10 06:24:07 UTC
change 46177 on gerrit.
Comment 8 Eike Rathke 2017-12-11 11:43:15 UTC
We should clarify when exactly leading and trailing blanks are to be trimmed. i.e. *not* in quoted field content because then by definition the blanks are part of the field content. Trimmed should be only blanks that are between a field separator and content.
Comment 9 Manuj Vashist 2017-12-12 17:45:24 UTC
Can anybody please confirm these : 
copying Heiko Tietze's comment from gerrit change 46177

>Patch Set 4: Code-Review-1

>#1 (reason for -1) Works well except what Eike commented on BZ about quoted
>fields (" 1 " is trimmed even when 'Format quoted fields as text' is checked). 

basically this means when user checked "Format quoted field as text" disable "Trim spaces". if yes, then it will be updated in next patch else please explain what that means.

>#2 (question) I wonder if user expects trim working on tab as well (the >procedure doesn't).

I think we all ready provide enough features to user in text import, tabs are not used commonly so no need to provide another checkbox.

>#3 (nice to have) And since all options have an effect on the preview it would >be nice to get a visual feedback in the import dialog.

I have to pass boolean of "trim spaces" as parameter to ScCsvGrid::ImplSetTextLineSep(...) in csvgrid.cxx and subsequent functions in between from IMPL_LINK_NOARG(...) in scuiasciiopt.cxx and set the default value=false (so that any other using these functions will not break).
Comment 10 Heiko Tietze 2017-12-13 07:58:59 UTC
(In reply to Manuj Vashist from comment #9)
> >#1 (reason for -1) Works well except what Eike commented on BZ about quoted
> >fields (" 1 " is trimmed even when 'Format quoted fields as text' is checked). 
> 
> basically this means when user checked "Format quoted field as text" disable
> "Trim spaces". if yes, then it will be updated in next patch else please
> explain what that means.

Sounds like a good solution but doesn't work here as you may have mixed content. For example on the input A, B,"C "," D " the B could still be trimmed.
 
> >#2 (question) I wonder if user expects trim working on tab as well (the >procedure doesn't).
> 
> I think we all ready provide enough features to user in text import, tabs
> are not used commonly so no need to provide another checkbox.

Accepted (and agreed). Just wanted to bring this on the table.
Comment 11 Manuj Vashist 2017-12-14 20:13:40 UTC
(In reply to Heiko Tietze from comment #10)
> Sounds like a good solution but doesn't work here as you may have mixed
> content. For example on the input A, B,"C "," D " the B could still be
> trimmed.

I am a bit confused can you please specify the case when :
i) non-quoted content doesn't get trimmed.
ii) quoted content gets trimmed.
thank you.
Comment 12 Heiko Tietze 2017-12-14 21:26:44 UTC
(In reply to Manuj Vashist from comment #11)
> (In reply to Heiko Tietze from comment #10)
> > Sounds like a good solution but doesn't work here as you may have mixed
> > content. For example on the input A, B,"C "," D " the B could still be
> > trimmed.
> 
> I am a bit confused can you please specify the case when :
> i) non-quoted content doesn't get trimmed.
> ii) quoted content gets trimmed.
> thank you.

Made my example for this ;-). A,B are trimmed when your checkbox is ticked regardless the quotation checkbox. C and D is trimmed when the quotation is not checked, meaning D - or better " 1 " will be taken as the actual number 1 and therefore trim makes sense.
Comment 13 Eike Rathke 2017-12-15 14:23:48 UTC
(In reply to Heiko Tietze from comment #12)
> Made my example for this ;-). A,B are trimmed when your checkbox is ticked
> regardless the quotation checkbox. C and D is trimmed when the quotation is
> not checked, meaning D - or better " 1 " will be taken as the actual number
> 1 and therefore trim makes sense.
That's debatable. In "C "," D " the actual field content is quoted, it's not just spaces that happen to be there. My take on this is to not trim quoted content. The number " 1 " is recognized as numeric anyway, unless "Format quoted field as text" says otherwise.
Comment 14 Heiko Tietze 2017-12-15 15:38:54 UTC
(In reply to Eike Rathke from comment #13)
> That's debatable. In "C "," D " the actual field content is quoted, it's not
> just spaces that happen to be there. My take on this is to not trim quoted
> content. The number " 1 " is recognized as numeric anyway, unless "Format
> quoted field as text" says otherwise.

Even when 'Quoted field as text' is checked? If users understand this option as "don't touch my input" the trimming is not expected, if the options is read as "Interpret numbers (and date/time) as text" the trimming is okay. Advantage of here is that the two options are not mutually related.

Trim off / Quoted off
' 1 ' = '1'
' A ' = ' A '
Trim on / Quoted off
' 1 ' = '1'
' A ' = 'A'
Trim off / Quoted on
'" 1 "' = '" 1 "'
'" A "' = '" A "'

And the challenged situation

Option a)
Trim on / Quoted on
'" 1 "' = '"1"'
'" A "' = '"A"'

That's my take to clearly distinguish the two options.

Option b)
Trim on / Quoted on
'" 1 "' = '" 1 "'
'" A "' = '" A "'

is Eike's opinion because of the higher weighting to the quotation. Up to you Manuj whose argumentation you follow :-). (We can ask more people of course)
Comment 15 Manuj Vashist 2017-12-15 18:02:04 UTC
I will go with eike, as the quoted numbers can be intended by the user in that way to contain space.
So, i support for no trimming in quoted fields.(also up for discussion on this)
Comment 16 Xisco Faulí 2018-01-15 03:22:40 UTC Comment hidden (obsolete)
Comment 17 Heiko Tietze 2018-01-15 09:19:19 UTC
(In reply to Xisco Faulí from comment #16)
> A polite ping, still working on this bug?

Code review pending on https://gerrit.libreoffice.org/#/c/46177/
Comment 18 Commit Notification 2018-01-26 11:05:44 UTC
manujvashist committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d999b55e28d0445df870f69b3a269964e3635def

tdf#114200  : added 'Trim space' feature in 'Text Import'

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Manuj Vashist 2018-01-26 15:53:42 UTC
This is fixed now please someone verify and close this as fixed.
thank you everyone for helping me in this.
Comment 20 Commit Notification 2018-03-31 11:23:58 UTC
Gabor Kelemen committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/help/commit/?id=29f6ad0004e6f49481ae32c577b394ec709b43f0

tdf#114200 Document new Trim space feature in Text Import