Bug 115012 - Chart in XLSX has gaps instead of/around zero values
Summary: Chart in XLSX has gaps instead of/around zero values
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.3.0
Keywords: filter:xlsx
Depends on:
Blocks: OOXML-Chart
  Show dependency treegraph
 
Reported: 2018-01-15 02:48 UTC by Aron Budea
Modified: 2021-08-28 13:31 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample XLSX (13.41 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2018-01-15 02:48 UTC, Aron Budea
Details
Screenshot in Excel (54.50 KB, image/png)
2018-01-15 02:50 UTC, Aron Budea
Details
sal_debug_calc_cell_content (128.80 KB, image/png)
2019-02-19 08:35 UTC, Adam Kovacs
Details
sal_debug_calc_cell_content_2 (160.92 KB, image/png)
2019-02-19 14:38 UTC, Adam Kovacs
Details
line chart vs stacked line chart in MSO (226.30 KB, image/png)
2019-02-22 10:45 UTC, Adam Kovacs
Details
line chart vs stacked line chart in MSO (234.23 KB, image/png)
2019-02-22 10:59 UTC, Adam Kovacs
Details
line chart empty cells and nan cells in excel vs calc (232.37 KB, image/png)
2019-02-22 11:06 UTC, Adam Kovacs
Details
empty formula line chart in excel vs calc (204.63 KB, image/png)
2019-03-01 08:09 UTC, Adam Kovacs
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2018-01-15 02:48:46 UTC
Created attachment 139098 [details]
Sample XLSX

Open the attached XLSX, which contains a simple chart of type 'Line with Markers'. The chart is set to show empty cells as gaps.
In the data column there are cells that appear empty, but aren't, their content is: =""

Thus in Excel the chart is continuous, but in Calc there are gaps instead of those "empty" values.

Observed using LO 6.0.0.1 & 4.0.0.3 / Windows 7.
Comment 1 Aron Budea 2018-01-15 02:50:00 UTC
Created attachment 139099 [details]
Screenshot in Excel
Comment 2 Xisco Faulí 2018-01-15 10:22:44 UTC
Reproduced in

Version: 6.1.0.0.alpha0+
Build ID: 0ef0740298b45379bbf8d00d50beffee7a2f812a
CPU threads: 4; OS: Linux 4.10; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group threaded

Version: 5.2.0.0.alpha1+
Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
CPU Threads: 4; OS Version: Linux 4.10; UI Render: default; 
Locale: ca-ES (ca_ES.UTF-8)

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e

Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)

LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 3 Aron Budea 2018-01-17 01:05:06 UTC
The problem here is that the same setting (leave gap for missing values) is interpreted slightly differently in Excel and in Calc: the mentioned cell content (="") is considered a missing value in Calc, but not in Excel.

Interestingly otherwise it isn't considered as a value even in Excel, but the chart plots it as if it was zero.

The respective OOXML tag is: <c:dispBlanksAs val="gap"/>
Comment 4 Aron Budea 2018-01-17 18:35:00 UTC
Interesting difference in wording as well.
Excel:
 - Show empty cells as: Gaps / Zero / Connect data points with line.
Calc:
 - Plot missing values: Leave gap / Assume zero / Continue line.

The bug occurs with non-numeric entries as well, eg. with an 'a'.
It's true that the cell is not empty (so behavior in Excel is correct), but it's also a missing value (so behavior in Calc is also correct).

And sure enough, the specifications also have this difference:
- OOXML: "Specifies how blank cells are plotted on the chart."
- ODF: "The chart:treat-empty-cells attribute specifies how missing and invalid values are plotted in a chart. "
Comment 5 Aron Budea 2018-01-18 04:36:48 UTC
An idea: add a GAP_MSO variant to the internal representation of the attribute values (the name is just for identification in this comment).

Gap value in ODS chart would be imported as GAP, but in XLSX chart it would be imported as GAP_MSO.
In the chart the invalid values would be shown as zero/not shown according to the respective specification.
Either value would be exported as the respective specification's GAP attribute value.
UI wouldn't change, there would be no mention of this difference, and changing to the gap value in the data series properties would set the regular gap.
Comment 6 Aron Budea 2018-02-19 03:56:19 UTC
Some comments from Markus on IRC:

<@moggi> the behavior, it is not the chart side but the calc side that forces us to interpret ="" as empty
<@moggi> implemented in ScChart2DataSequence::BuildDataCache
<bearon> this will return false, right? pFCell->IsValue()
<@moggi> yes
<@moggi> well, it would require to pass the missing value treatment to the sc chart data provider as this requires to treat cache items with Item::mbIsValue == false and Item::maString.isEmpty() == true as 0
<@moggi> currently it is NaN
<bearon> apart from this problem, is extending the mentioned MissingValueTreatment.idl with an "OOXML-specific" value an acceptable change?
<@moggi> I would need to ask sberg what the rules are about published constants
<@moggi> if extending them is considered API breaking this would be a much bigger task as that would require to replace the chart1 API constants with a new chart2 API constant group
Comment 7 Adam Kovacs 2019-02-19 08:35:32 UTC
Created attachment 149406 [details]
sal_debug_calc_cell_content

Refreshing the topic with some links.

The file you were talking about with pFCell->IsValue()
https://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/chart2uno.cxx?r=b4f38d8b#2485

And the OOXML import:
https://opengrok.libreoffice.org/xref/core/oox/source/drawingml/chart/datasourcecontext.cxx?r=fada813d#51
Comment 8 Adam Kovacs 2019-02-19 10:28:48 UTC
This can be also related:
https://bugs.documentfoundation.org/show_bug.cgi?id=102621
Comment 10 Adam Kovacs 2019-02-19 13:25:31 UTC
In the member function ScChart2DataSequence::getNumericalData():
https://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/chart2uno.cxx?r=b4f38d8b#2923
There is a line:
*pArr = rItem.mbIsValue ? rItem.mfValue : fNAN;
Comment 11 Adam Kovacs 2019-02-19 14:38:11 UTC
Created attachment 149422 [details]
sal_debug_calc_cell_content_2

So if we write this...
*pArr = rItem.mbIsValue ? rItem.mfValue : 0.0;
...instead of this...
*pArr = rItem.mbIsValue ? rItem.mfValue : fNAN;
...here...
https://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/chart2uno.cxx?r=b4f38d8b#2923
...then we get the result on the screenshot.
Comment 12 Adam Kovacs 2019-02-22 10:45:13 UTC
Created attachment 149520 [details]
line chart vs stacked line chart in MSO

By the way, by default, MSO line charts also use gaps for empty or NaN values. But they have a chart type called stacked line chart.
Comment 13 Adam Kovacs 2019-02-22 10:59:49 UTC
Created attachment 149521 [details]
line chart vs stacked line chart in MSO

another comparsion in Excel 2016
Comment 14 Adam Kovacs 2019-02-22 11:06:50 UTC
Created attachment 149522 [details]
line chart empty cells and nan cells in excel vs calc

this screenshot shows what is the difference between excel and calc due to this bug
Comment 15 Adam Kovacs 2019-03-01 08:09:54 UTC
Created attachment 149671 [details]
empty formula line chart in excel vs calc

I was wrong when I thought that this bug was about LibreOffice didnt handle strings as cell values in line charts.

This bug is about empty formulas: =""

This screenshot demostrates the problem of this bug.
Comment 16 Adam Kovacs 2019-03-01 08:25:15 UTC
How to reproduce it:

1. Open Excel/Calc
2. Write some values in a column, one of them should be ="" (an empty formula), for example: 1, ="", void cell, 3
3. Select the area of column
4. Insert a line chart
5. Compare it in Excel vs Calc
Comment 17 Adam Kovacs 2019-03-01 09:29:53 UTC
Some related stuff in Excel:
https://www.extendoffice.com/documents/excel/4374-excel-skip-blanks-in-chart.html
Comment 21 Adam Kovacs 2019-03-06 12:45:51 UTC
20.64 chart:treat-empty-cells
The chart:treat-empty-cells attribute specifies how missing and invalid values are plotted
in a chart.
The defined values for the chart:treat-empty-cells attribute are:
- ignore: nothing is plotted for a point with missing values and the plot continues.
- leave-gap: nothing is plotted for a point with missing values and a continuous plot stops and
restarts at the next valid point.
- use-zero: missing values are replaced with zero.
The default behavior is ignore for scatter charts (chart:class is chart:scatter). For all
other chart types the default behavior is leave-gap.
This attribute is evaluated for chart styles that are applied to a <chart:plot-area> element.
The chart:treat-empty-cells attribute is usable with the following element:
<style:chart-properties> 17.22.
Comment 23 Commit Notification 2019-03-12 12:17:13 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/f684c074d5f66c8b1546a626766bc045c04cebc3%5E%21

tdf#115012 XLSX chart import: workaround for no gap

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 László Németh 2019-03-12 12:21:00 UTC
Fixed with a workaround, using use-zero chart:treat-empty-cells setting, if it's possible. Special cases, when the data source contains also empty cells, not only no-gap empty values, need other solution.