Download it now!
Bug 76310 - ODF incompliance: Cannot use newline in formula
Summary: ODF incompliance: Cannot use newline in formula
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 120354 (view as bug list)
Depends on:
Blocks: ODF-import
  Show dependency treegraph
 
Reported: 2014-03-18 09:19 UTC by Mike Kaganski
Modified: 2019-11-13 12:28 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Valid file with line breaks (1.95 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-03-19 02:24 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2014-03-18 09:19:46 UTC
OASIS OpenFormula specification allows using SPACE, TAB, LINE FEED and CARRIAGE RETURN as space characters. It also specifies that embedded newline be represented by single LINE FEED, and requires implementations to "retain whitespace entered by the original formula creator and use it when saving or presenting the formula, and should not add additional whitespace unless directed to do so during the process of editing a formula" (see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1017970_715980110).

Currently, Calc does allow entering newline when editing cell content (using Ctrl+Enter), but as soon as cell editing is finished, all newlines in formula are converted to spaces.

This is incompliant with the specification. This limitation is inherited from OOo 3.3.
Comment 1 m.a.riosv 2014-03-19 00:52:00 UTC
Hike Mike,

I'm not sure about your interpretation of OASIS specification.

Normal text in a cell, not formula, retains line feed, and you can locate it from functions or use directly through a reference in other cell.

About to use it in text inside formulas:
"Evaluators shall treat SPACE (U+0020), CHARACTER TABULATION (U+0009), LINE FEED (U+000A), and CARRIAGE RETURN (U+000D) as whitespace characters."

I think you are talking about use a line feed inside a formula, if entered in text between quotes with keyboard Ctrl+Enter it is transformed in a white space, but you can use CHAR() function to get it, e.g.
A1: =LEFT("123"&CHAR(10)&"456";7)
shows two lines.
B1: =A1
shows two lines also.
Comment 2 Mike Kaganski 2014-03-19 01:10:55 UTC
(In reply to comment #1)
Hi m.a.riosv,
Thank you for considerations.

Please note that clause 5 (as vell as the whole OpenFormula spec) covers formulas, not ordinary text entered to cells.

With that in mind, please note the following excerpt from 5.15 "Whitespace":

>An embedded line break shall be represented by a single LINE FEED character
>(U+000A), not by a carriage return-linefeed pair. When embedded in an XML
>attribute the linefeed character is represented as “
”.

In my opinion, this is the specific gideline how to implement NEWLINES in formula body, not a character matching pattern used as a function's input paramener. Treating this as a function meaningful parameter is illegal, because the whole clause 5.14 is devoted to lexical element "Whitespace" that's usage is defined in 5.2, and that is used (according to 5.2) OUTSIDE any parameters. The mention of string literals in 5.14 is somewhat misleading, and I believe that this mention serves the only purpose to emphasize that the same characters, when, inside such literals, must NOT be interpreted as Whitespace lexical unit.

Thus, as the Whitespace is required to stay as user entered it, and as embedded line break is defined in Whitespace, I believe that not allowing this in formula is non-compliance.

Specifically, using newlines inside complex formulas may allow for more readable formula (nested IFs starting on new lines with indent, for instance).
Comment 3 m.a.riosv 2014-03-19 01:52:29 UTC
In any case how it must be treated in a formula body, I think doesn't imply that it must be accepted in a mandatory way to use in the formula.

Maybe would be better change the importance to enhancement.

And perhaps can mean some problem in calc performance.
Comment 4 Mike Kaganski 2014-03-19 02:24:29 UTC
Created attachment 96023 [details]
Valid file with line breaks

(In reply to comment #3)
Well, as to enhancement,

This attached file is standard-conformant. It is produced by a real-life ODF-compliant software: MSO2013. It cannot be properly read by LO; so it is a bug. 

Considering that this problem affects compatibility with MSO (see also Bug 56036 for a real-life problem that already has arisen; there will be more, if ODF gains some popularity - remember GB govt initialive), I suppose that this may be seen this way:

1. Fix Bug 56036 in an easy way (just replace input newlines with spaces on open/import, to match current LO policy) - I am not sure if this step is good one, or if it will be easier than doing right thing from the start. However, this will allow for correct calculations in open/imported documents. But it will remove newlines.

2. Fix internal inability to work with newlines (that prevents from reading this attachment). This is an importans step, that I prefer over step 1. Besides formal standard compliance, it allows for retaining the existing newlines (even if it's impossible to add new), and subsequently correct round-trip of documents, thus removing another reason to consider this suite a second-class citizen.

3. Allow entering this in UI. Probably only this part could be made enhancement. However, without previous steps (that are fixing real bugs), this enhancement is useless.

This division is only meaningful if developer sees it appropriate. If this could be better fixed in one step, then this problem, imo, cannot be considered an enhancement as a whole.
Comment 5 Owen Genat (retired) 2014-07-16 08:06:18 UTC
(In reply to comment #0)
> OASIS OpenFormula specification allows using SPACE, TAB, LINE FEED and
> CARRIAGE RETURN as space characters. ... (see
> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.
> html#__RefHeading__1017970_715980110).

It would be worth clarifying whether ODF v1.2, Part 2, §15.4 is an oversight from the revision between these versions: 

- ODF v1.1, Part 1, §1.6 White-Space Processing and EOL Handling 
- ODF v1.2, Part 1, §3.18 White Space Processing and EOL Handling

ODF v1.1, Part 1, §1.6 explicitly mentioned the same four characters. This was determined to be inconsistent with that described in the referenced XML v1.0 4th Ed. / 2006 specification and was subsequently amended to remove all reference to the four characters. The OASIS metabug is:

https://issues.oasis-open.org/browse/OFFICE-1521

... with issues 1211 and 1539 offering the most detail on what was decided.

The situation with ODF v1.2, Part 2 (OpenFormula) may need to undergo a similar revision (it also references the same XML spec), however this needs to be raised with OASIS (and probably should be for clarity in this case anyway). I am setting the status to NEEDINFO until someone from OASIS can assist with this matter.
Comment 6 Owen Genat (retired) 2014-07-16 08:33:29 UTC
(In reply to comment #5)
> It would be worth clarifying whether ODF v1.2, Part 2, §15.4 is an oversight
> ... 
> I am setting the status to NEEDINFO until someone from OASIS can
> assist with this matter.

Scrap that idea. It appears this has been discussed in this OASIS issue:

https://issues.oasis-open.org/browse/OFFICE-701

The clarifying comments in the change document (linked at end) clearly indicate the reason for including these characters (to allow for clearer representation of complex formula). Given the attachment provided in comment 4 I can confirm the behaviour under Debian 7 x86_64 v4.3.0.2 Build ID: 14ed55896fdfcb93ff437b85c4f3e1923d2b1409. Status set to NEW. Platform to All/All.
Comment 7 m.a.riosv 2015-02-12 21:51:24 UTC
*** Bug 89350 has been marked as a duplicate of this bug. ***
Comment 8 QA Administrators 2016-02-21 08:38:23 UTC Comment hidden (obsolete)
Comment 9 Mike Kaganski 2016-02-22 09:03:57 UTC
Still reproducible with 5.1.0.3
Comment 10 QA Administrators 2017-03-06 16:15:13 UTC Comment hidden (obsolete)
Comment 11 Roman Kuznetsov 2018-09-20 07:47:41 UTC
still present in LO 6.1.1.2
Comment 12 Mike Kaganski 2018-09-20 13:37:17 UTC
A code pointer:

ScCompiler::NextSymbol() returns number of spaces (it simply counts space characters and discards specific values). It should return space string instead.

ScCompiler::NextNewToken(), in turn, creates a new ocSpaces token, which only has length data. So, we should change it to hold the space string (and so keep original spacing characters (spaces, tabs, newlines)).
Comment 13 Mike Kaganski 2018-10-05 21:20:36 UTC
*** Bug 120354 has been marked as a duplicate of this bug. ***