Bug 150137 - Crash when parsing an XML with undeclared namespace
Summary: Crash when parsing an XML with undeclared namespace
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Aron Budea
URL:
Whiteboard: target:24.2.0 target:7.6.1
Keywords: difficultyBeginner, easyHack, skillCpp
Depends on:
Blocks: Crash
  Show dependency treegraph
 
Reported: 2022-07-25 11:32 UTC by Mike Kaganski
Modified: 2023-08-08 18:31 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2022-07-25 11:32:04 UTC
Given this sample XML data:

<?xml version="1.0" encoding="UTF-8"?>
<a:xml/>

Save it to a local file; now run this BASIC code:

Sub Parse
  source = CreateUnoStruct("com.sun.star.xml.sax.InputSource")
  source.aInputStream = CreateUnoService("com.sun.star.ucb.SimpleFileAccess").openFileRead(ConvertToURL("path/to/file.xml"))
  parser = CreateUnoService("com.sun.star.xml.sax.FastParser")
  parser.ParseStream(source)
End Sub

LibreOffice crashes.
This happens because the namespace 'a' is undeclared in the XML. This is a regression from commit 3aa52d36824d11b8774de15708fdfcbb93cd9dc3
  Author Mohammed Abdul Azeem <azeemmysore@gmail.com>
  Date   Mon Jul 18 13:17:19 2016 +0530
    GSOC - Handling namespace declaration missing case:

The code like

>   if ( !m_bIgnoreMissingNSDecl || URI != nullptr )
>       sNamespace = OUString( XML_CAST( URI ), strlen( XML_CAST( URI )), RTL_TEXTENCODING_UTF8 );

dereferences URI equal to nullptr (given m_bIgnoreMissingNSDecl is false).

The easyhack should have a unit test.
Comment 1 Rafael Lima 2022-07-25 14:29:58 UTC
Repro with

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 21a31eefab1401d288dbb8220f3df3365be9efaf
CPU threads: 16; OS: Linux 5.15; UI render: default; VCL: kf5 (cairo+xcb)
Locale: pt-BR (pt_BR.UTF-8); UI: en-US
Calc: threaded

Indeed LibreOffice crashes with the XML sample.

If the XML is changed to the following, the crash no longer happens.

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:a="http://www.w3.org/TR/html4/">
<a:xml/>
Comment 2 Mike Kaganski 2022-07-25 14:34:33 UTC
(In reply to Rafael Lima from comment #1)
> <?xml version="1.0" encoding="UTF-8"?>
> <root xmlns:a="http://www.w3.org/TR/html4/">
> <a:xml/>

Or rather, to this (which is both valid, and equivalent to the original one):

<?xml version="1.0" encoding="UTF-8"?>
<a:xml xmlns:a="http://www.w3.org/TR/html4/"/>

Just to avoid confusion, the bug is about the crash, that must not happen regardless of the (potentially invalid or ambiguous or otherwise questionable) user input.
Comment 3 Amar 2022-08-31 00:24:51 UTC
So what's the expected behavior? 
I've created a test case and it fails with the fast parser and provided xml. 
If I pass the IgnoreMissingNSDecl arg (which was implemented in the mentioned commit), then it's not failing anymore. Should m_bIgnoreMissingNSDecl in fastparser.cxx by default be set to true instead of false? 
I am missing the big picture here, as I just started with the project.

The exception raising part is this:
> if( !nNamespace && !m_bIgnoreMissingNSDecl )
>     throw SAXException("No namespace defined for " + OUString(XML_CAST(pPrefix),
>                   nPrefixLen, RTL_TEXTENCODING_UTF8), Reference< XInterface >(), Any());

and it actually never gets into:
>   if ( !m_bIgnoreMissingNSDecl || URI != nullptr )
>       sNamespace = OUString( XML_CAST( URI ), strlen( XML_CAST( URI )), RTL_TEXTENCODING_UTF8 )
or the else case
Comment 4 Mike Kaganski 2022-08-31 04:56:31 UTC
(In reply to Amar from comment #3)
> So what's the expected behavior? 

(In reply to Mike Kaganski from comment #2)
> Just to avoid confusion, the bug is about the crash, that must not happen
> regardless of the (potentially invalid or ambiguous or otherwise
> questionable) user input.

The expected behavior is: there is no crash. Since the input is not valid, missing the namespace declaration, the macro should give a runtime error.
Comment 5 Commit Notification 2023-08-07 15:39:02 UTC
Aron Budea committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c9e863801509fb37b125a8fb07358fb1b235496d

tdf#150137 fastparser: don't crash on undeclared namespace

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2023-08-08 18:31:24 UTC
Aron Budea committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/8e00558e6a569995905e1d18643605c6e4946391

tdf#150137 fastparser: don't crash on undeclared namespace

It will be available in 7.6.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.