Bug 162839 - Base needs to inform users when opening "malformed" input text files
Summary: Base needs to inform users when opening "malformed" input text files
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
24.8.0.3 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Import
  Show dependency treegraph
 
Reported: 2024-09-07 16:26 UTC by rehierl
Modified: 2024-12-24 05:42 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of import options (105.92 KB, image/png)
2024-09-07 22:54 UTC, m_a_riosv
Details
Database properties when connecting to csv files (59.62 KB, image/png)
2024-09-08 06:39 UTC, Robert Großkopf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description rehierl 2024-09-07 16:26:42 UTC
Description:
When using Base to connect to a filesystem folder with text-based CSV data files, what you will see in a table view does not necessarily correspond with what is in the data files.

steps to reproduce:
- create a new "test" folder in your file system
- create a "test.csv" file in that test folder and save it with the contents shown below
- start libreoffice desktop
- click on "Base Database" to open libreoffice base
- in the "Select database" dialog select "Connect to an existing database"
- select "Text/CSV" and click the "Next" button
- enter the path to the test folder in the "path to text files" text box
- enter the other options according to the input (.csv) file
- click the "Finish" button to open the database
- save as "test.odb" in some location
- open the database file and switch to the tables
- double-click on the "test" table to open it

test.csv input file contents:
a;b;c
1;2;3
;2
;;;4

to point out the details:
- there is a heading row with 3 columns
- the 1st row has 3 columns
- the 2nd row has 2 columns - no 3rd column
- the 3rd row has 4 columns - one additional cell

what will be displayed in the table view is:
a;b;c
1;2;3
;2;2
;;

the issue with row 2:
- the last non-null value in a row gets duplicated to all subsequent cells with no value, until the end of that row.
- ERROR - this is an error because these duplicate values are no part in the.

the issue with row 3:
- that row appears to be empty, even though there is acutal data in that row.
- ISSUE - no feedback is given to the user that points out the "malformed" input file.

the issue with this:
- users do not get any feedback about problematic input files.
- since there is no feedback, users are prone to assume that the data displayed is as contained within the input file.
- with input files that may hold hundreds of rows, that is problematic.

possible changes:
- when opening a problematic input file show a popup to the user that points out that what is visible in the table view is a mere approximation.
- maybe include the number of affected rows in the dialog.
- maybe add an additional column to the table view, which would allow to filter for the affected rows.

on a side note:
- there is no such issue when opening the input file in libreoffice calc.


Steps to Reproduce:
As described above.

Actual Results:
- ERROR - Values may be silently duplicated.
- ISSUE - Values may get silently ignored.

Expected Results:
At the very least, users should get informed that the format of the corresponding input file is not as expected.


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.8.0.3 (X86_64) / LibreOffice Community
Build ID: 0bdf1299c94fe897b119f97f3c613e9dca6be583
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: de-DE (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 m_a_riosv 2024-09-07 22:54:15 UTC
Created attachment 196304 [details]
Screenshot of import options

With the options in the attached screenshot, works fine, I think.
Version: 24.8.1.1 (X86_64) / LibreOffice Community
Build ID: ef51c4a0cd35185debf25ad9d0db6a1c14bed5a0
CPU threads: 16; OS: Windows 11 X86_64 (10.0 build 22631); UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded
Comment 2 Robert Großkopf 2024-09-08 06:39:34 UTC
Created attachment 196307 [details]
Database properties when connecting to csv files

There will be a difference between the file imported into Calc an a file read by the database. Database will only see 3 columns and put the content in the three columns.

a b c
1 2 3

will be right.

a b c
1 2 3
  2 2

There should be data for 3 columns, but there aren't: Base copies the the value of column 2 to column 3 → wrong values will be shown.

Row 4 won't show any value, because the table will only show 3 columns and the value from the original csv-file contains 4 columns in row 4.

Base should never show values in a table, which aren't there. So row 3 is a buggy behavior.
Comment 3 Robert Großkopf 2024-09-08 06:40:08 UTC
All tested with
Version: 24.8.1.1 (X86_64) / LibreOffice Community
Build ID: ef51c4a0cd35185debf25ad9d0db6a1c14bed5a0
CPU threads: 6; OS: Linux 6.4; UI render: default; VCL: kf5 (cairo+xcb)
Locale: de-DE (de_DE.UTF-8); UI: en-US
Calc: threaded