Bug 150137 - Crash when parsing an XML with undeclared namespace
Summary: Crash when parsing an XML with undeclared namespace
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: BASIC (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyBeginner, easyHack, skillCpp
Depends on:
Blocks: Crash
  Show dependency treegraph
 
Reported: 2022-07-25 11:32 UTC by Mike Kaganski
Modified: 2022-10-17 17:27 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2022-07-25 11:32:04 UTC
Given this sample XML data:

<?xml version="1.0" encoding="UTF-8"?>
<a:xml/>

Save it to a local file; now run this BASIC code:

Sub Parse
  source = CreateUnoStruct("com.sun.star.xml.sax.InputSource")
  source.aInputStream = CreateUnoService("com.sun.star.ucb.SimpleFileAccess").openFileRead(ConvertToURL("path/to/file.xml"))
  parser = CreateUnoService("com.sun.star.xml.sax.FastParser")
  parser.ParseStream(source)
End Sub

LibreOffice crashes.
This happens because the namespace 'a' is undeclared in the XML. This is a regression from commit 3aa52d36824d11b8774de15708fdfcbb93cd9dc3
  Author Mohammed Abdul Azeem <azeemmysore@gmail.com>
  Date   Mon Jul 18 13:17:19 2016 +0530
    GSOC - Handling namespace declaration missing case:

The code like

>   if ( !m_bIgnoreMissingNSDecl || URI != nullptr )
>       sNamespace = OUString( XML_CAST( URI ), strlen( XML_CAST( URI )), RTL_TEXTENCODING_UTF8 );

dereferences URI equal to nullptr (given m_bIgnoreMissingNSDecl is false).

The easyhack should have a unit test.
Comment 1 Rafael Lima 2022-07-25 14:29:58 UTC
Repro with

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 21a31eefab1401d288dbb8220f3df3365be9efaf
CPU threads: 16; OS: Linux 5.15; UI render: default; VCL: kf5 (cairo+xcb)
Locale: pt-BR (pt_BR.UTF-8); UI: en-US
Calc: threaded

Indeed LibreOffice crashes with the XML sample.

If the XML is changed to the following, the crash no longer happens.

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:a="http://www.w3.org/TR/html4/">
<a:xml/>
Comment 2 Mike Kaganski 2022-07-25 14:34:33 UTC
(In reply to Rafael Lima from comment #1)
> <?xml version="1.0" encoding="UTF-8"?>
> <root xmlns:a="http://www.w3.org/TR/html4/">
> <a:xml/>

Or rather, to this (which is both valid, and equivalent to the original one):

<?xml version="1.0" encoding="UTF-8"?>
<a:xml xmlns:a="http://www.w3.org/TR/html4/"/>

Just to avoid confusion, the bug is about the crash, that must not happen regardless of the (potentially invalid or ambiguous or otherwise questionable) user input.
Comment 3 Amar 2022-08-31 00:24:51 UTC
So what's the expected behavior? 
I've created a test case and it fails with the fast parser and provided xml. 
If I pass the IgnoreMissingNSDecl arg (which was implemented in the mentioned commit), then it's not failing anymore. Should m_bIgnoreMissingNSDecl in fastparser.cxx by default be set to true instead of false? 
I am missing the big picture here, as I just started with the project.

The exception raising part is this:
> if( !nNamespace && !m_bIgnoreMissingNSDecl )
>     throw SAXException("No namespace defined for " + OUString(XML_CAST(pPrefix),
>                   nPrefixLen, RTL_TEXTENCODING_UTF8), Reference< XInterface >(), Any());

and it actually never gets into:
>   if ( !m_bIgnoreMissingNSDecl || URI != nullptr )
>       sNamespace = OUString( XML_CAST( URI ), strlen( XML_CAST( URI )), RTL_TEXTENCODING_UTF8 )
or the else case
Comment 4 Mike Kaganski 2022-08-31 04:56:31 UTC
(In reply to Amar from comment #3)
> So what's the expected behavior? 

(In reply to Mike Kaganski from comment #2)
> Just to avoid confusion, the bug is about the crash, that must not happen
> regardless of the (potentially invalid or ambiguous or otherwise
> questionable) user input.

The expected behavior is: there is no crash. Since the input is not valid, missing the namespace declaration, the macro should give a runtime error.