Bug 64038 - too many spaces that follow <text:s/> are collapsed
Summary: too many spaces that follow <text:s/> are collapsed
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium normal
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: odf target:5.3.0
Keywords: filter:odf
Depends on:
Blocks:
 
Reported: 2013-04-29 09:45 UTC by Jos van den Oever
Modified: 2016-10-14 21:53 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
test document for space collapsing (3.75 KB, application/vnd.oasis.opendocument.text)
2013-04-29 09:45 UTC, Jos van den Oever
Details
rendering of test document in calligra (18.95 KB, application/pdf)
2013-04-29 09:49 UTC, Jos van den Oever
Details
rendering of test document in libreoffice (48.81 KB, application/pdf)
2013-04-29 09:49 UTC, Jos van den Oever
Details
rendering of test document in ms office (8.23 KB, image/png)
2013-04-29 09:50 UTC, Jos van den Oever
Details
same issue but now in odp file (11.25 KB, application/vnd.oasis.opendocument.presentation)
2013-05-01 10:08 UTC, Jos van den Oever
Details
test files as rendered by LibreOffice 5.0.0.5 (56.17 KB, application/pdf)
2015-08-26 12:27 UTC, Jos van den Oever
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jos van den Oever 2013-04-29 09:45:51 UTC
Created attachment 78591 [details]
test document for space collapsing

In ODF consecutive space are collapsed. E.g.
  <text:p><text:span>a </text:span> b</text:p>
should have 1 space between 'a' and 'b'. <text:s/> is used for non-collapsing spaces.
LibreOffice collapses normal spaces after <text:s/>, for example:
  <span>a </span><s/> <span>b</span>
should have 3 spaces between 'a' and 'b'. LibreOffice only shows 2.
Comment 1 Jos van den Oever 2013-04-29 09:49:04 UTC
Created attachment 78592 [details]
rendering of test document in calligra
Comment 2 Jos van den Oever 2013-04-29 09:49:31 UTC
Created attachment 78593 [details]
rendering of test document in libreoffice
Comment 3 Jos van den Oever 2013-04-29 09:50:30 UTC
Created attachment 78594 [details]
rendering of test document in ms office
Comment 4 Joel Madero 2013-05-01 03:08:17 UTC
Michael - is this expected behavior or indeed a bug?
Comment 5 Jos van den Oever 2013-05-01 10:08:49 UTC
Created attachment 78708 [details]
same issue but now in odp file
Comment 6 Robinson Tryon (qubit) 2014-02-04 15:56:22 UTC
(In reply to comment #4)
> Michael - is this expected behavior or indeed a bug?

Michael - ping!

Thanks,
Comment 7 tommy27 2015-05-22 02:14:58 UTC
please give an update of the bug status with current LibO 4.4.3.2 release.
Comment 8 Jean-Baptiste Faure 2015-08-01 07:31:27 UTC
comment #4 and comment #7 -> NEEDINFO.
Comment 9 Jos van den Oever 2015-08-26 12:27:55 UTC
Created attachment 118197 [details]
test files as rendered by LibreOffice 5.0.0.5

LO still handles the spaces wrong on some lines.
Comment 10 Robinson Tryon (qubit) 2015-12-03 11:23:42 UTC
Converting Whiteboard tags to Keywords: filter:odf
Comment 11 Xisco Faulí 2016-09-11 21:13:04 UTC Comment hidden (obsolete)
Comment 12 Jean-Baptiste Faure 2016-09-12 04:44:37 UTC
Bug submitter answered question in comment #7.
Now, we are waiting for answer to comment #4.

Best regards. JBF
Comment 13 Cor Nouws 2016-10-13 12:19:07 UTC
@jos: obviously you can point us to the specs right away, so we can read and set to new?

or alternatively:

@mstahl:

(In reply to Jean-Baptiste Faure from comment #12)
> Now, we are waiting for answer to comment #4.

maybe you can confirm right away?
Comment 14 Jos van den Oever 2016-10-13 13:36:55 UTC
The description of space collapsing is given in
  http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#a6_1_2White_Space_Characters

Following steps 1, 2, and 3 in the section for the ODF fragment:
  <span>a </span><s/> <span>b</span>
should yield 3 spaces between 'a' and 'b'.

There is no rule that says that spaces after <s/> may be ignored.
Comment 15 Michael Stahl (allotropia) 2016-10-13 13:59:04 UTC
thanks for reminding me of this.

apparently nothing changed here between OOo 3.3 and current master.

this is all rather confusing so i'm not quite sure i understand it.

is whitespace directly following <text:s> allowed at all?

6.1.3 <text:s>
The <text:s> element is used to represent the [UNICODE] character “ “ (U+0020, SPACE).
This element shall be used to represent the second and all following “ “ (U+0020, SPACE) characters in a sequence of “ “ (U+0020, SPACE) characters.


doesn't a space character following the <text:s> violate the "shall" wording above, and it should be something like this?

  <span>a </span><s c="2"/><span>b</span>
  <span>a </span><s/><span><s/>b</span>

or does "sequence" imply there are no opening or closing XML tags between the spaces?

some relevant issues:
https://issues.oasis-open.org/browse/OFFICE-3828
https://issues.oasis-open.org/browse/OFFICE-3706
Comment 16 Jos van den Oever 2016-10-13 15:13:29 UTC
  <span>a </span><s c="2"/><span>b</span>

is the right way to create this document. And one might even claim that 

  <span>a </span><s/> <span>b</span>

is invalid.
But it would still be nice if applications would interpret it the same way.
Comment 17 Michael Stahl (allotropia) 2016-10-14 21:52:53 UTC
fixed on master
Comment 18 Commit Notification 2016-10-14 21:53:29 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b5b57c677fd7fe9b2594e428c556862df88fca9d

tdf#64038 ODF import: fix handling of space following <text:s>

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.