Bug 108995 - DOCX IMPORT: extra whitespace in some parts of text in a specific DOCX
Summary: DOCX IMPORT: extra whitespace in some parts of text in a specific DOCX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Hardware: All All
: medium normal
Assignee: Mike Kaganski
Whiteboard: target:6.0.0
Keywords: filter:docx
Depends on:
Reported: 2017-07-07 08:52 UTC by Mike Kaganski
Modified: 2017-07-07 11:40 UTC (History)
0 users

See Also:
Crash report or crash signature:
Regression By:

A test document with runs with and without xml:space (1.28 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-07-07 08:52 UTC, Mike Kaganski

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2017-07-07 08:52:26 UTC
Created attachment 134524 [details]
A test document with runs with and without xml:space

Attached document has leading and trailing whitespace (tab and space) characters only in first line when opened in Word. Also, tabs between words also exist in first line. Second line has neither leading/trailing whitespace, nor tabs.

In LibreOffice, the two lines both have leading/trailing whitespace, and tabs between words.

The problem is absent handling of xml:space attribute (see paragraph 2.10 of XML 1.0 specification and of ECMA-376-1:2016).
Comment 1 Mike Kaganski 2017-07-07 08:57:29 UTC
A patch is under review: https://gerrit.libreoffice.org/39682
Comment 2 Commit Notification 2017-07-07 10:52:53 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":


tdf#108995: take xml:space attribute into account

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:

Affected users are encouraged to test the fix and report feedback.