Bug 113726 - Trailing space in cell text not encoded as <text:s/>
Summary: Trailing space in cell text not encoded as <text:s/>
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.3.3.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: odf
Keywords: implementationError
Depends on:
Blocks: Calc-Cells
  Show dependency treegraph
 
Reported: 2017-11-09 01:22 UTC by Andrew Church
Modified: 2023-08-19 19:37 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Church 2017-11-09 01:22:53 UTC
If a cell contains a trailing space, the space is copied literally to the content of the <text:p> tag for the cell rather than being encoded as a <text:s/> tag:

<office:document-content ...>
   ...
     <table:table-cell office:value-type="string" calcext:value-type="string">
      <text:p><text:s/>spaces </text:p>
     </table:table-cell>
   ...
</office:document-content>

I'm not familiar with the ODS spec so I don't know whether a literal space is permitted in this context, but at least some versions of Excel seem to discard the space.  (LibreOffice itself preserves the space without problems.)

To reproduce, create a new Calc document, enter the text " spaces " in the first cell, and save as ODS.
Comment 1 Kevin Suo 2017-11-09 10:00:26 UTC
I confirm the behaviour as mentioned by the bug reporter. I also confirm that this ods file when open in MSO 2010 the last space is not shown. 

But I am not sure whether this is a bug of LibreOffice or not.

Adding keyword implementationError, needsDevEval.

The ODF specification says:
(http://docs.oasis-open.org/office/v1.2/cs01/OpenDocument-v1.2-cs01-part1.html#element-text_s)
> The <text:s> element is used to represent the [UNICODE] character “ “ (U+0020, SPACE).
>
> This element shall be used to represent the second and all following “ “ (U+0020, SPACE) characters in a sequence of “ “ (U+0020, SPACE) characters.
>
> Note: It is not an error if the character preceding the element is not a white space character, but it is good practice to use this element only for the second and all following “ “ (U+0020, SPACE) characters in a sequence.
Comment 2 Xisco Faulí 2018-01-15 11:16:22 UTC
@Regina, Do you have any insight here?
Comment 3 Regina Henschel 2018-01-15 13:31:12 UTC
I have no strong opinion yet. Whitespace handling is still an issue in the TC, see OFFICE-3706 and OFFICE-3828
Comment 4 Regina Henschel 2020-09-13 21:59:14 UTC
The description of <text:s> has, "This element shall be used to represent the second and all following “ “ (U+0020, SPACE) characters in a sequence of “ “ (U+0020, SPACE) characters.

In Version: 7.0.0.2 (x64)
Build ID: c01aa64b6c3d89ebe5fe69c28c7adb24eb85249c
CPU threads: 8; OS: Windows 10.0 Build 18362; UI render: Skia/Raster; VCL: win
Locale: de-DE (en_US); UI: en-US
Calc: CL
I see the <text:s> element, if there are two spaces.

I therefore think, that it is correct, that for one space no <text:s> element is used.
Comment 5 Andrew Church 2020-09-13 23:02:28 UTC
> The description of <text:s> has, "This element shall be used to represent the second and all following “ “ (U+0020, SPACE) characters in a sequence of “ “ (U+0020, SPACE) characters.

As an initial issue, "shall be used to represent the second" does not imply "shall not be used to represent the first", so I don't think this is enough to resolve the question.

Looking at the ODF 1.3 specification (https://www.oasis-open.org/committees/download.php/67566/OpenDocument-v1.3-part3-schema.odt):

"""
6.1.2. White Space Characters

Consumers shall collapse white space that occur in
 * a <text:p> or <text:h> element [...]

Collapsing white space characters is defined by the following algorithm:
[...]
5) Leading " " (U+0020, SPACE) characters at the start of the resulting text [...] are removed.
"""

I hope I haven't omitted any relevant points by accident, but by my reading of this algorithm, "<text:p> space</text:p>" should have the initial space removed and be parsed as the text "space" (5 characters), suggesting that Excel has the right of it and <text:s/> is required to correctly encode the leading space.

Apologies for not doing the legwork on this in the first place, but reopening.
Comment 6 Andrew Church 2020-09-13 23:06:20 UTC
(I belatedly realize I filed the bug over _trailing_ rather than _leading_ spaces, but the same logic applies to trailing spaces.)
Comment 7 Regina Henschel 2020-09-13 23:21:17 UTC
@Micheal: Can you please look at it? Problem is step 5) in the algorithm in section 6.1.2 (ODF 1.3 part3) compared to the description of <text:s> in section 6.1.3 (ODF 1.3 part3).
Comment 8 Michael Stahl (allotropia) 2022-02-04 11:15:48 UTC
i think it's a bug in LO.

the space before the end of the element must be written as text:s because:

5) Leading “ “ (U+0020, SPACE) characters at the start of the resulting text and trailing SPACE characters at the end of the resulting text are removed.