Bug 160257 - SF_Calc.ImportFromCSVFile does not work when empty column
Summary: SF_Calc.ImportFromCSVFile does not work when empty column
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
7.6.5.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: ScriptForge
  Show dependency treegraph
 
Reported: 2024-03-18 10:18 UTC by Eric Champy
Modified: 2024-03-21 10:58 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
A selection of rows from the CSV file to be imported (133.57 KB, image/jpeg)
2024-03-18 10:18 UTC, Eric Champy
Details
The same rows in the Sheet after Import (119.97 KB, image/jpeg)
2024-03-18 10:19 UTC, Eric Champy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Champy 2024-03-18 10:18:18 UTC
Created attachment 193176 [details]
A selection of rows from the CSV file to be imported

I have a CSV file that contains an empty column in the midle of columns. I want to import the columns in libreOffice exactly in the same format, including the empty column. 

I use SF_Calc.ImportFromCSVFile ...

Doing that, the empty column  has disappeared during the import and some raw data are switched to the fist left empty cell ... 

See the 2 attached documents:
1) Original file to be imported
2) Sheet after the import
Comment 1 Eric Champy 2024-03-18 10:19:19 UTC
Created attachment 193177 [details]
The same rows in the Sheet after Import
Comment 2 m_a_riosv 2024-03-18 12:22:38 UTC
Images for this matter are not practical, please attach a sample csv file, to let someone test the issue.
Comment 3 Rafael Lima 2024-03-18 17:55:58 UTC
I can confirm the issue. To test I used a "dummy.csv" file that contains:

A,,1
B,,2
C,,3

Hence, there are 3 columns and the column in the middle is empty. Then i opened a blank Calc file and used the code below:

Sub ImportCSV
    GlobalScope.BasicLibraries.LoadLibrary("ScriptForge")
    oDoc = CreateScriptService("Calc", ThisComponent)
    oDoc.ImportFromCSVFile("/home/rafael/Documents/dummy.csv", "A1")
End Sub

The data was imported into 2 columns instead of 3.

This should be possible, because the CSV import dialog has an option "Merge Delimiters", so you could create an import filter that adds this option.

FilterOptions := "9/59/44,34,76,1,,0,false,false,true,false,false,0,false,false,true"

This will work... but I agree that this options string is crazy and it should be simpler. Maybe each option in the CSV import dialog should have its own argument in the ImportFromCSVFile method, since this string is super cryptic... to figure out how it works, one needs to look into the LibreOffice source code.

@JPL what's your opinion about this?

System info

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: 420(Build:2)
CPU threads: 16; OS: Linux 6.5; UI render: default; VCL: kf5 (cairo+wayland)
Locale: pt-BR (pt_BR.UTF-8); UI: en-US
Ubuntu package version: 4:24.2.1~rc2-0ubuntu0.23.10.1~lo1
Calc: threaded
Comment 4 Rafael Lima 2024-03-18 17:57:40 UTC
Code pointer is:
https://opengrok.libreoffice.org/xref/core/sc/source/ui/inc/asciiopt.hxx

The attribute you're looking for is bSkipEmptyCells.
Comment 5 Jean-Pierre Ledure 2024-03-19 09:43:00 UTC
Easier than the code is next reference:
https://wiki.documentfoundation.org/Documentation/DevGuide/Spreadsheet_Documents#Filter_Options_for_the_CSV_Filter

I read there:
"To treat several consecutive separators as one, the four letters /MRG have to be appended to the (first) token.". Like in "9/44/59/MRG". This is indeed the default behavior for ScriptForge.

The number of variants is huge. Some tokens are chosen in long numeric tables: encoding, language, while in most UNO interfaces user-friendly text values are admitted: "UTF-8" or "en". Can the correspondence be established by code ?

Is it worth to transcode the variants into individual arguments ?

I am doubting for all the above reasons.

Anyway the Calc help file can be completed. I suggest, after "ImportFromCSVFile",
"The field separator is a comma, a semi-colon or a Tab character."
to add something like
"Consecutive field separators are treated as one."
Comment 6 Rafael Lima 2024-03-20 14:16:18 UTC
(In reply to Jean-Pierre Ledure from comment #5)
> Easier than the code is next reference:
> https://wiki.documentfoundation.org/Documentation/DevGuide/
> Spreadsheet_Documents#Filter_Options_for_the_CSV_Filter

Nice... I knew this existed somewhere.

But I guess the help page is now more complete and up-to-date:
https://help.libreoffice.org/latest/en-US/text/shared/guide/csv_params.html

The problem is that neither explains which token refers to the "Merge Delimiters" option.

TBH I could not determine which of them corresponds to "Merge Delimiters"... I inverted all tokens and nothing changed. Weird.

Also, the string I posted above (which I intercepted from within LO code after the OK button is pressed) has 15 tokens (as in the help), instead of 14 (as in the Wiki). So maybe the Wiki needs to be updated.

(In reply to Jean-Pierre Ledure from comment #5)
> Is it worth to transcode the variants into individual arguments ?

Yeah, the effort would be huge, with not much benefit.
Comment 7 Jean-Pierre Ledure 2024-03-20 15:35:48 UTC
(In reply to Rafael Lima from comment #6)
> TBH I could not determine which of them corresponds to "Merge Delimiters"...
> I inverted all tokens and nothing changed. Weird.

The answer was already in my comment#5:
"To treat several consecutive separators as one, the four letters /MRG have to be appended to the (first) token.". Like in "9/44/59/MRG". This is indeed the default behavior for ScriptForge."

I didn't know the help page about the CSV options. Even better than the wiki ! Except for the /MRG.

As I understand, there is no bug.
The default filteroptions in SF are: "9/44/59/MRG,34,76,1,,1033,true,true"
Removing /MRG and passing the string as argument should do the job (but I didn't test it ...).

Eric, could you confirm ?
Comment 8 Eric Champy 2024-03-21 10:19:16 UTC
YES, I confirm it works ... Anyhow, I suggest to improve the libreoffice help ...

The help about "filteroptions" provides information about the default behavior, but does not provide any information on how to modify these default options...

At least the sample provided above by Jean-Pierre, could be very useful as example ... 

Or at least a link should be provided to any other page that describes in detail this.

Thanks for your support ...
Comment 9 Eric Champy 2024-03-21 10:21:55 UTC
(In reply to Eric Champy from comment #8)
> YES, I confirm it works ... Anyhow, I suggest to improve the libreoffice
> help ...
> 
> The help about "filteroptions" provides information about the default
> behavior, but does not provide any information on how to modify these
> default options...
> 
> At least the sample provided above by Jean-Pierre, could be very useful as
> example ... 
> 
> Or at least a link should be provided to any other page that describes in
> detail this.
> 
> Thanks for your support ...

Another point to be documented:

When the format of the empty col is Text, the column stays empty, but when the format is "Number", the column is set with "-1" on every row ...
Comment 10 m_a_riosv 2024-03-21 10:49:23 UTC
CSV has no format, between separators there is nothing or not.

Please report the suggested enhancement on the help, as a new bug.
Comment 11 Eric Champy 2024-03-21 10:58:11 UTC
(In reply to m_a_riosv from comment #10)
> CSV has no format, between separators there is nothing or not.
> 
> Please report the suggested enhancement on the help, as a new bug.

Sorry my bad, you are right, "-1" has been added by the Java program before the Import ... 

Will do that, Thanks