Bug 120301 - xml parser preserves whitespace in pretty-formatted vmlDrawing1.vml
Summary: xml parser preserves whitespace in pretty-formatted vmlDrawing1.vml
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:6.2.0 target:6.1.3
Keywords: bibisected, bisected, filter:ooxml, regression
Depends on:
Blocks:
 
Reported: 2018-10-04 10:12 UTC by Justin L
Modified: 2019-10-29 10:41 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
xmlParsing.xlsx: imported labels contain extra spaces from "pretty formatted" xml (11.25 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2018-10-04 10:12 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Justin L 2018-10-04 10:12:44 UTC
Created attachment 145368 [details]
xmlParsing.xlsx: imported labels contain extra spaces from "pretty formatted" xml

The label of a vml radio button imports wrongly. All of the indent spacing of the xml file is added to the label name, instead of treating sequential whitespace as a single space.

  <v:textbox style='mso-direction-alt:auto' o:singleclick="f">
   <div style='text-align:left'><font face="Segoe UI" size="160" color="auto">Check
   Box 1</font></div>
  </v:textbox>

Reproduce the problem.
1.) open xmlParsing.xlsx.
2.) The button names should be "Check Box 1" and "Option Button 2".

Instead, the names have three spaces added. "Check     Box 1". This has been true from earliest times (testing with bibisect43all.)

The earliest point I got to in debugging was sax/source/fastparser/fastparser.cxx
FastSaxParserImpl::sendPendingCharacters() where the entire string, including the extra spaces, can be seen.
Comment 1 Justin L 2018-10-04 14:12:41 UTC
The faulty commit from 2010
commit 7a5084f1acacb0858588d4d0c82651e47ca9914f
Author: Daniel Rentz 
Date:   Mon Feb 7 17:18:11 2011 +0100

    dr78: rework of stream handling, improve handling of very large streams (prevent loading entire stream into array or string, esp. dumper and VML import), full support of XComponentContext

diff --git a/oox/source/vml/vmlinputstream.cxx b/oox/source/vml/vmlinputstream.cxx
--- a/oox/source/vml/vmlinputstream.cxx
+++ b/oox/source/vml/vmlinputstream.cxx
@@ -56,5 +58,5 @@
 inline bool lclIsWhiteSpace( sal_Char cChar )
 {
-    return (cChar == ' ') || (cChar == '\t') || (cChar == '\n') || (cChar == '\r');
+    return cChar < 32;
 }

Hmm, the space character *is* 32, so should be cChar <= 32
Comment 2 Commit Notification 2018-10-04 19:30:22 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=2cae2ecfef47d8dd10647c10f9577392c1887d3a

tdf#120301 oox: lclIsWhiteSpace should return true for a space

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 Commit Notification 2018-10-05 08:32:13 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-6-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8ef25505303dcd744d20abf7e328ce1f0eda4dbf&h=libreoffice-6-1

tdf#120301 oox: lclIsWhiteSpace should return true for a space

It will be available in 6.1.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.