Bug 76310 - ODF incompliance: Cannot use newline in formula
Summary: ODF incompliance: Cannot use newline in formula
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:7.3.0 inReleaseNotes
Keywords:
: 120354 131552 (view as bug list)
Depends on:
Blocks: ODF-import
  Show dependency treegraph
 
Reported: 2014-03-18 09:19 UTC by Mike Kaganski
Modified: 2022-01-01 11:00 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
Valid file with line breaks (1.95 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-03-19 02:24 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2014-03-18 09:19:46 UTC
OASIS OpenFormula specification allows using SPACE, TAB, LINE FEED and CARRIAGE RETURN as space characters. It also specifies that embedded newline be represented by single LINE FEED, and requires implementations to "retain whitespace entered by the original formula creator and use it when saving or presenting the formula, and should not add additional whitespace unless directed to do so during the process of editing a formula" (see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1017970_715980110).

Currently, Calc does allow entering newline when editing cell content (using Ctrl+Enter), but as soon as cell editing is finished, all newlines in formula are converted to spaces.

This is incompliant with the specification. This limitation is inherited from OOo 3.3.
Comment 1 m_a_riosv 2014-03-19 00:52:00 UTC
Hike Mike,

I'm not sure about your interpretation of OASIS specification.

Normal text in a cell, not formula, retains line feed, and you can locate it from functions or use directly through a reference in other cell.

About to use it in text inside formulas:
"Evaluators shall treat SPACE (U+0020), CHARACTER TABULATION (U+0009), LINE FEED (U+000A), and CARRIAGE RETURN (U+000D) as whitespace characters."

I think you are talking about use a line feed inside a formula, if entered in text between quotes with keyboard Ctrl+Enter it is transformed in a white space, but you can use CHAR() function to get it, e.g.
A1: =LEFT("123"&CHAR(10)&"456";7)
shows two lines.
B1: =A1
shows two lines also.
Comment 2 Mike Kaganski 2014-03-19 01:10:55 UTC
(In reply to comment #1)
Hi m.a.riosv,
Thank you for considerations.

Please note that clause 5 (as vell as the whole OpenFormula spec) covers formulas, not ordinary text entered to cells.

With that in mind, please note the following excerpt from 5.15 "Whitespace":

>An embedded line break shall be represented by a single LINE FEED character
>(U+000A), not by a carriage return-linefeed pair. When embedded in an XML
>attribute the linefeed character is represented as “
”.

In my opinion, this is the specific gideline how to implement NEWLINES in formula body, not a character matching pattern used as a function's input paramener. Treating this as a function meaningful parameter is illegal, because the whole clause 5.14 is devoted to lexical element "Whitespace" that's usage is defined in 5.2, and that is used (according to 5.2) OUTSIDE any parameters. The mention of string literals in 5.14 is somewhat misleading, and I believe that this mention serves the only purpose to emphasize that the same characters, when, inside such literals, must NOT be interpreted as Whitespace lexical unit.

Thus, as the Whitespace is required to stay as user entered it, and as embedded line break is defined in Whitespace, I believe that not allowing this in formula is non-compliance.

Specifically, using newlines inside complex formulas may allow for more readable formula (nested IFs starting on new lines with indent, for instance).
Comment 3 m_a_riosv 2014-03-19 01:52:29 UTC
In any case how it must be treated in a formula body, I think doesn't imply that it must be accepted in a mandatory way to use in the formula.

Maybe would be better change the importance to enhancement.

And perhaps can mean some problem in calc performance.
Comment 4 Mike Kaganski 2014-03-19 02:24:29 UTC
Created attachment 96023 [details]
Valid file with line breaks

(In reply to comment #3)
Well, as to enhancement,

This attached file is standard-conformant. It is produced by a real-life ODF-compliant software: MSO2013. It cannot be properly read by LO; so it is a bug. 

Considering that this problem affects compatibility with MSO (see also Bug 56036 for a real-life problem that already has arisen; there will be more, if ODF gains some popularity - remember GB govt initialive), I suppose that this may be seen this way:

1. Fix Bug 56036 in an easy way (just replace input newlines with spaces on open/import, to match current LO policy) - I am not sure if this step is good one, or if it will be easier than doing right thing from the start. However, this will allow for correct calculations in open/imported documents. But it will remove newlines.

2. Fix internal inability to work with newlines (that prevents from reading this attachment). This is an importans step, that I prefer over step 1. Besides formal standard compliance, it allows for retaining the existing newlines (even if it's impossible to add new), and subsequently correct round-trip of documents, thus removing another reason to consider this suite a second-class citizen.

3. Allow entering this in UI. Probably only this part could be made enhancement. However, without previous steps (that are fixing real bugs), this enhancement is useless.

This division is only meaningful if developer sees it appropriate. If this could be better fixed in one step, then this problem, imo, cannot be considered an enhancement as a whole.
Comment 5 Owen Genat (retired) 2014-07-16 08:06:18 UTC
(In reply to comment #0)
> OASIS OpenFormula specification allows using SPACE, TAB, LINE FEED and
> CARRIAGE RETURN as space characters. ... (see
> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.
> html#__RefHeading__1017970_715980110).

It would be worth clarifying whether ODF v1.2, Part 2, §15.4 is an oversight from the revision between these versions: 

- ODF v1.1, Part 1, §1.6 White-Space Processing and EOL Handling 
- ODF v1.2, Part 1, §3.18 White Space Processing and EOL Handling

ODF v1.1, Part 1, §1.6 explicitly mentioned the same four characters. This was determined to be inconsistent with that described in the referenced XML v1.0 4th Ed. / 2006 specification and was subsequently amended to remove all reference to the four characters. The OASIS metabug is:

https://issues.oasis-open.org/browse/OFFICE-1521

... with issues 1211 and 1539 offering the most detail on what was decided.

The situation with ODF v1.2, Part 2 (OpenFormula) may need to undergo a similar revision (it also references the same XML spec), however this needs to be raised with OASIS (and probably should be for clarity in this case anyway). I am setting the status to NEEDINFO until someone from OASIS can assist with this matter.
Comment 6 Owen Genat (retired) 2014-07-16 08:33:29 UTC
(In reply to comment #5)
> It would be worth clarifying whether ODF v1.2, Part 2, §15.4 is an oversight
> ... 
> I am setting the status to NEEDINFO until someone from OASIS can
> assist with this matter.

Scrap that idea. It appears this has been discussed in this OASIS issue:

https://issues.oasis-open.org/browse/OFFICE-701

The clarifying comments in the change document (linked at end) clearly indicate the reason for including these characters (to allow for clearer representation of complex formula). Given the attachment provided in comment 4 I can confirm the behaviour under Debian 7 x86_64 v4.3.0.2 Build ID: 14ed55896fdfcb93ff437b85c4f3e1923d2b1409. Status set to NEW. Platform to All/All.
Comment 7 m_a_riosv 2015-02-12 21:51:24 UTC
*** Bug 89350 has been marked as a duplicate of this bug. ***
Comment 8 QA Administrators 2016-02-21 08:38:23 UTC Comment hidden (obsolete)
Comment 9 Mike Kaganski 2016-02-22 09:03:57 UTC
Still reproducible with 5.1.0.3
Comment 10 QA Administrators 2017-03-06 16:15:13 UTC Comment hidden (obsolete)
Comment 11 Roman Kuznetsov 2018-09-20 07:47:41 UTC
still present in LO 6.1.1.2
Comment 12 Mike Kaganski 2018-09-20 13:37:17 UTC
A code pointer:

ScCompiler::NextSymbol() returns number of spaces (it simply counts space characters and discards specific values). It should return space string instead.

ScCompiler::NextNewToken(), in turn, creates a new ocSpaces token, which only has length data. So, we should change it to hold the space string (and so keep original spacing characters (spaces, tabs, newlines)).
Comment 13 Mike Kaganski 2018-10-05 21:20:36 UTC
*** Bug 120354 has been marked as a duplicate of this bug. ***
Comment 14 Commit Notification 2021-07-28 16:57:15 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/516318113f0bd2b3c658aba9b285165e63a280e2

Resolves: tdf#76310 Preserve whitespace TAB, CR, LF in formula expressions

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2021-07-28 18:09:57 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/56cebfdbce89e7400a89678d14b847243e88f971

Keep ScParameterClassification sorted, tdf#76310 follow-up

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 BogdanB 2021-07-30 17:32:58 UTC
Verified with the document form comment 4. Working.

Verified with
Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 5aa74aa1e6fac571f99146ebcb6adc9feb1459ad
CPU threads: 4; OS: Linux 5.8; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-07-28_19:35:14
Calc: threaded
Comment 17 Eike Rathke 2021-08-10 09:50:08 UTC
*** Bug 131552 has been marked as a duplicate of this bug. ***
Comment 18 Commit Notification 2021-08-11 22:21:09 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3c0c1b79b689775f09b3bff75bb01118b380f41f

tdf#76310: subsequent_filters: Add unittest

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Stéphane Guillou (stragu) 2022-01-01 11:00:57 UTC
Reviewing 7.3 release notes.

Verified fix with:

Version: 7.3.0.1 / LibreOffice Community
Build ID: 840fe2f57ae5ad80d62bfa6e25550cb10ddabd1d
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

...using attachment 96023 [details]