Bug 113726 - Trailing space in cell text not encoded as <text:s/>
Summary: Trailing space in cell text not encoded as <text:s/>
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.3.3.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: implementationError, needsDevAdvice
Depends on:
Blocks:
 
Reported: 2017-11-09 01:22 UTC by Andrew Church
Modified: 2018-01-15 13:32 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Church 2017-11-09 01:22:53 UTC
If a cell contains a trailing space, the space is copied literally to the content of the <text:p> tag for the cell rather than being encoded as a <text:s/> tag:

<office:document-content ...>
   ...
     <table:table-cell office:value-type="string" calcext:value-type="string">
      <text:p><text:s/>spaces </text:p>
     </table:table-cell>
   ...
</office:document-content>

I'm not familiar with the ODS spec so I don't know whether a literal space is permitted in this context, but at least some versions of Excel seem to discard the space.  (LibreOffice itself preserves the space without problems.)

To reproduce, create a new Calc document, enter the text " spaces " in the first cell, and save as ODS.
Comment 1 Kevin Suo 2017-11-09 10:00:26 UTC
I confirm the behaviour as mentioned by the bug reporter. I also confirm that this ods file when open in MSO 2010 the last space is not shown. 

But I am not sure whether this is a bug of LibreOffice or not.

Adding keyword implementationError, needsDevEval.

The ODF specification says:
(http://docs.oasis-open.org/office/v1.2/cs01/OpenDocument-v1.2-cs01-part1.html#element-text_s)
> The <text:s> element is used to represent the [UNICODE] character “ “ (U+0020, SPACE).
>
> This element shall be used to represent the second and all following “ “ (U+0020, SPACE) characters in a sequence of “ “ (U+0020, SPACE) characters.
>
> Note: It is not an error if the character preceding the element is not a white space character, but it is good practice to use this element only for the second and all following “ “ (U+0020, SPACE) characters in a sequence.
Comment 2 Xisco Faulí 2018-01-15 11:16:22 UTC
@Regina, Do you have any insight here?
Comment 3 Regina Henschel 2018-01-15 13:31:12 UTC
I have no strong opinion yet. Whitespace handling is still an issue in the TC, see OFFICE-3706 and OFFICE-3828