Created attachment 145368 [details] xmlParsing.xlsx: imported labels contain extra spaces from "pretty formatted" xml The label of a vml radio button imports wrongly. All of the indent spacing of the xml file is added to the label name, instead of treating sequential whitespace as a single space. <v:textbox style='mso-direction-alt:auto' o:singleclick="f"> <div style='text-align:left'><font face="Segoe UI" size="160" color="auto">Check Box 1</font></div> </v:textbox> Reproduce the problem. 1.) open xmlParsing.xlsx. 2.) The button names should be "Check Box 1" and "Option Button 2". Instead, the names have three spaces added. "Check Box 1". This has been true from earliest times (testing with bibisect43all.) The earliest point I got to in debugging was sax/source/fastparser/fastparser.cxx FastSaxParserImpl::sendPendingCharacters() where the entire string, including the extra spaces, can be seen.
The faulty commit from 2010 commit 7a5084f1acacb0858588d4d0c82651e47ca9914f Author: Daniel Rentz Date: Mon Feb 7 17:18:11 2011 +0100 dr78: rework of stream handling, improve handling of very large streams (prevent loading entire stream into array or string, esp. dumper and VML import), full support of XComponentContext diff --git a/oox/source/vml/vmlinputstream.cxx b/oox/source/vml/vmlinputstream.cxx --- a/oox/source/vml/vmlinputstream.cxx +++ b/oox/source/vml/vmlinputstream.cxx @@ -56,5 +58,5 @@ inline bool lclIsWhiteSpace( sal_Char cChar ) { - return (cChar == ' ') || (cChar == '\t') || (cChar == '\n') || (cChar == '\r'); + return cChar < 32; } Hmm, the space character *is* 32, so should be cChar <= 32
Justin Luth committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=2cae2ecfef47d8dd10647c10f9577392c1887d3a tdf#120301 oox: lclIsWhiteSpace should return true for a space It will be available in 6.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Justin Luth committed a patch related to this issue. It has been pushed to "libreoffice-6-1": http://cgit.freedesktop.org/libreoffice/core/commit/?id=8ef25505303dcd744d20abf7e328ce1f0eda4dbf&h=libreoffice-6-1 tdf#120301 oox: lclIsWhiteSpace should return true for a space It will be available in 6.1.3. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.