Bug 160983 - RecalcOptimalRowHeightMode should be effective for CSV imports also
Summary: RecalcOptimalRowHeightMode should be effective for CSV imports also
Status: NEEDINFO
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
24.8.0.0 alpha0+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL: https://www.reddit.com/r/libreoffice/...
Whiteboard:
Keywords:
Depends on:
Blocks: CSV-Dialog
  Show dependency treegraph
 
Reported: 2024-05-07 23:06 UTC by Craig Ruff
Modified: 2024-06-06 06:54 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
sample CSV with a lot of text in one cell (6.04 KB, text/csv)
2024-05-23 01:12 UTC, Stéphane Guillou (stragu)
Details
Example CSV that exhibits the behavior when imported (727.44 KB, text/csv)
2024-05-23 14:47 UTC, Craig Ruff
Details
Sampled file after import (64.25 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-05-23 16:35 UTC, m_a_riosv
Details
Screenshot import dialog. (71.98 KB, image/png)
2024-05-23 16:36 UTC, m_a_riosv
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Craig Ruff 2024-05-07 23:06:20 UTC
Description:
Please extend the use of RecalcOptimalRowHeightMode to work for CSV imports.

Steps to Reproduce:
1. Import a CSV file
2. 
3.

Actual Results:
Adjusting row height happens taking a very long time for large CSV files.

Expected Results:
Row height adjustment is user configurable just like ODF and XLS file opens.


Reproducible: Always


User Profile Reset: No

Additional Info:
N/A
Comment 1 m_a_riosv 2024-05-08 21:23:45 UTC
How to?, CSV is imported as unformatted text, what it is.
Comment 2 Craig Ruff 2024-05-08 23:28:41 UTC
Not sure why you are asking how to, when I import some CVS files with wide text columns or text columns that include new line characters, Calc first imports the CSV files then spends time "adjusting the row heights" according to the status message at the bottom of the window. The some rows of the CVS files end up being marked as 39.37" high, and scrolling in Calc is brain damaged such that you get large jumps and can't review them at all. Since there is no inherent height information in the CSV file, Calc is the one actor performing this behavior. It seems readily analogous to that being done when reading an ODF or XLS file.
Comment 3 m_a_riosv 2024-05-09 20:51:21 UTC
But these adjustments are not by the format, only length and new lines, something that LO knows when importing.
But there is no format on the csv text, like bold e.g.
The only thing is to apply the style you need to the data, by modifying one that already exist or creating a new one. Even you can modify default style.

Maybe better create a template just for that, first create a new file with that template then use
 Menu/Sheet
 - Insert sheet
 - Insert sheet from file
 - External links
for the two first you can set up it as link, so an update is possible., without recreated the import.
Comment 4 Stéphane Guillou (stragu) 2024-05-23 01:11:03 UTC
This has been also requested here: https://www.reddit.com/r/libreoffice/comments/nj3oll/can_i_have_preset_for_row_height_and_column_width/

I couldn't reproduce the issue with ridiculous row heights when importing a CSV. Do you have an example file?
Comment 5 Stéphane Guillou (stragu) 2024-05-23 01:12:06 UTC
Created attachment 194285 [details]
sample CSV with a lot of text in one cell

Not reproduce with this file and:

Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 101b08fe1ec77ffe8c1a9b2b8f9f20884269a1ed
CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: CL threaded
Comment 6 Craig Ruff 2024-05-23 03:40:56 UTC
I'll have to sanitize and pare one down to reasonable dimensions, they contain proprietary data.
Comment 7 Craig Ruff 2024-05-23 14:47:45 UTC
Created attachment 194309 [details]
Example CSV that exhibits the behavior when imported

CSV import options:
   From row: 1
   Separated by: comma
   String delimiter: double quote mark
   Format quoted field as text: yes

> scalc --version
LibreOffice 24.2.3.2 433d9c2ded56988e8a90e6b2e771ee4e6a5ab2ba
Comment 8 m_a_riosv 2024-05-23 16:35:37 UTC
Created attachment 194312 [details]
Sampled file after import

No issue for me with:
Version: 24.2.3.2 (X86_64) / LibreOffice Community
Build ID: 433d9c2ded56988e8a90e6b2e771ee4e6a5ab2ba
CPU threads: 16; OS: Windows 10.0 Build 22631; UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded
Comment 9 m_a_riosv 2024-05-23 16:36:09 UTC
Created attachment 194313 [details]
Screenshot import dialog.
Comment 10 Stéphane Guillou (stragu) 2024-05-24 04:48:04 UTC
I see the "calculating" message and the high-height cells (e.g. at row 54) when importing the sample file with only comma as a delimiter.

Version: 24.2.3.2 (X86_64) / LibreOffice Community
Build ID: 433d9c2ded56988e8a90e6b2e771ee4e6a5ab2ba
CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: CL threaded

I can see how it can be frustrating for users wanting to edit simple formats like CSV/TSV, expecting LO to handle it as simply as possible.

Ideally, you'd expect a simple on/off setting in the import dialog, along the lines of "Adapt row height to fit contents"? Or something different?

Justin and Balazs, you recently worked on RecalcOptimalRowHeightMode, what do you think? 

Copying the UX/Design team in for opinion.
Comment 11 Heiko Tietze 2024-05-24 08:43:19 UTC
(In reply to Stéphane Guillou (stragu) from comment #10)
> I see the "calculating" message and the high-height cells (e.g. at row 54)
> when importing the sample file with only comma as a delimiter.
Me don't, although it takes some milliseconds to process. Line 1197 aka log data point #2097 is more extreme, and I struggle with the use case of putting a novel into a single cell. The example encloses a long string with quotation marks and RecalcOptimalRowHeightMode does its job as expected.

> I can see how it can be frustrating for users wanting to edit simple formats
> like CSV/TSV, expecting LO to handle it as simply as possible.
Sure, if CSV is kept simple. But the example is some kind of log file with an (X)ML format.
Comment 12 ady 2024-05-24 10:44:17 UTC
(In reply to Stéphane Guillou (stragu) from comment #10)
(e.g. at row 54)

Cell B54 includes line-breaking characters. They should not be modified by the import process.

Either, Rows nor Columns get any modification (they would not adapt at all, leaving both axis with default values only, no matter the contents), or they get some kind of adapting to contents.

Some users won't like the first alternative, whereas others won't like the second possibility (and both groups will complain). I am not sure that simply changing the behavior would be considered as an improvement. I would agree to let users modify the respective sizes manually after importing, but other users won’t like that, especially newbies.

Providing choice (about one behavior or the other) for users might be seen as improvement, but that means adding some checkbox somewhere.
Comment 13 ady 2024-05-24 10:47:25 UTC
(In reply to Heiko Tietze from comment #11)

> Me don't, although it takes some milliseconds to process.

Same here, but maybe the attachment is just a simplified case(?).
Comment 14 Justin L 2024-05-24 12:28:56 UTC
You definitely want to have row height automatically calculated for a CSV import, since CSV cannot possibly specify a height for any row. Otherwise any multiline data (like row 54) would be "hidden" - looking like single line data (like row 53).

CSV is always a transitional format. You load it once, then format it according to your liking and then export it into an appropriate format.

Other than avoiding a performance penalty, I can't imagine any reason not to want to have the row height be calculated at load time. The proper response to being irritated about the speed is to optimize row height calculation.

[That said, I did say in my XLSX commit "I can't think of a reason why this shouldn't apply to all formats". As a code pointer, see https://gerrit.libreoffice.org/c/core/+/164721 ]
Comment 15 Justin L 2024-05-24 15:42:14 UTC
(In reply to ady from comment #12) 
> Some users won't like the first alternative, whereas others won't like the
> second possibility (and both groups will complain). I am not sure that
> simply changing the behavior would be considered as an improvement.
OP is not asking for a change in the default behaviour.

RecalcOptimalRowHeightMode is an advanced configuration option that affects whether rows that do not mandate a specific height should have that height calculated at import time.
  -yes (default - no change in behaviour)
  -no (useful for ODS which I believe knows the last-used-height at import)
  -ask

P.S. I just checked ScDocShell::ConvertTo SC_TEXT_CSV_FILTER_NAME and it does NOT set bSetRowHeights to true. So doing what OP asks would not be as trivial as I thought it might be.

Still a WONTFIX from me. Only useful for formats that already know an appropriate height.
Comment 16 Eyal Rozenberg 2024-06-05 18:29:22 UTC
(In reply to Justin L from comment #14)
> You definitely want to have row height automatically calculated for a CSV
> import, since CSV cannot possibly specify a height for any row.

The conclusion does not follow from the premise. That is, when importing a CSV, I often want all rows to have the same height, regardless of the contents - even if some cells have multiple line breaks. It's often more convenient to browse the contents that way (especially when the columns you're interested in  are not the multiline ones, and not even just then).

This is similar to how developers read commit logs and look at only the first line of the commit comment even if it's a multi-line comment.
Comment 17 Heiko Tietze 2024-06-06 06:54:12 UTC
We discussed the topic in the design meeting.

Apparently the newly introduced option RecalcOptimalRowHeightMode provides everything. "Apparantly" because whether 0,1, or 2 the imported example CSV was always recalcuclated (Windows, build from master). Balazs, can you please test this?

Missing documentation is tracked in bug 160179.

If the "ask" option was not introduced to cover this use case we have different opinions whether an option on the UI is needed. Some believe it is a not so rare use case other it has a negative impact on usability for most users.