Bug 84237 - Method setErrorHandler in DOM::CDocumentBuilder not working
Summary: Method setErrorHandler in DOM::CDocumentBuilder not working
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium normal
Assignee: Fyodor
URL:
Whiteboard: target:6.0.0
Keywords: difficultyBeginner, easyHack, skillCpp
Depends on:
Blocks: 39625
  Show dependency treegraph
 
Reported: 2014-09-23 10:00 UTC by Tobias Madl
Modified: 2018-12-20 09:40 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Madl 2014-09-23 10:00:05 UTC
===Method setErrorHandler not working===

Background:
To make some CppUnit Tests working (Bug 39625), it was one task to fix the unoxml/test/domtest.cxx. But this test depends on the possibility to overwrite the Errorhandler of the parser (unoxml/source/dom/documentbuilder.cxx: method: setErrorHandler). In order to make this test working again, this method has to be fixed. At the moment it looks like the passed errorhandler is overwritten or even not used at all. For reproduction you can uncomment the method errorInputTest() in unoxml/test/domtest.cxx and run the test, and it will throw exeptions, because the wrong errorhandler is used.

More Information:
The DocumentBuilder wrapps the SAX parser and the SAX parser wrapps the external expat library.

The domtest.cxx is for testing the parser with different xml snippets. It was designed for checking valid input, warnings, errors and fatal errors in the xml.
There are still some more tests in it, but for now they are commented out.
Comment 1 Matúš Kukan 2014-09-25 23:04:55 UTC
Hmm, so I had a look and unoxml is wrapper around libxml2 library, not expat.
From sax, it uses only FastAttributeList.

It should be easy to fix this, have a look at warning_func in
unoxml/source/dom/documentbuilder.cxx
It has pContext->_private = this
so one could do some casting magic and use m_xErrorHandler in warning_func / error_func. Not sure if there is also fatalerror in xmlParserCtxt::sax - quite possibly there is and could be added too.

OTOH if it's not used already, it does not make sense to add this only for a unit test.
Comment 2 raal 2016-05-02 09:42:11 UTC
Tobias, Matúš,
should this bug's status be new?
Comment 3 Michael Stahl (allotropia) 2016-05-06 14:11:05 UTC
true the CDocumentBuilder::m_xErrorHandler is set in that function and then never actually used => NEW
Comment 4 Michael Stahl (allotropia) 2016-05-06 14:18:41 UTC
this should be quite easy to fix, in unoxml/source/dom/documentbuilder.cxx
there are already error callbacks set before invoking libxml2,
namely warning_func() and error_func(), and i guess the
 pContext->_private = this;
will cause the void* parameter of these *_func() to refer to the
CDocumentBuilder instance.

so this basically means to modify warning_func() and error_func()
to invoke the corresponding m_xErrorHandler methods
from com.sun.star.xml.sax.XErrorHandler interface.

care must be taken to catch any exceptions thrown,
because they must not pass through libxml2 C code.
Comment 5 Fyodor 2017-08-11 08:07:28 UTC
In brief, this bug relates to UnitTests for unoxml module which are failing with unhanded exception. In unittests we want to override this behavior by calling setErrorHandler, which really doesn't work, as it not implemented. In UnitTests there is custom struct ErrorHandler defined, which has warning, error and fatalerror member functions. It's expected that these function called in place of exception and if so, unittests will complete correctly.

But I think that such behaiviuor (calling error handler's functions in plase of throwing exception) is not correct. Code in CDocumentBuilder class relates to exceptions in it methods. For example, this is part from CDocumentBuilder::parse method (starting from line 337 in file unoxml/source/dom/documentbuilder.cxx)

// ------------------- C++ code starts ----------------------

xmlDocPtr const pDoc = xmlCtxtReadIO(pContext.get(), xmlIO_read_func, xmlIO_close_func, &c, nullptr, nullptr, 0);
// Previous line calls xmlCtxtReadIO from libxml2 to parse xml
// If there is error in parsed xml, pDoc == nullptr - this means some error

if (pDoc == nullptr) {
// In case of error we throw exception and PASS CONTROL to first catch statement
throwEx(pContext.get());
}

// If there is no error - we init XDocument and return it
Reference< XDocument > const xRet(CDocument::CreateCDocument(pDoc).get());
return xRet;

// ------------------- C++ code ends ----------------------

As seen, if we will not throw an exception, we should at least modify CDocumentBuilder::parse to return nullptr and not try to init XDocument. Also this can lead to modification of other code which is call CDocumentBuilder::parse AND code which calls throwEx as it not throw exception if custom error handler is set. As a result may be we'll should modify many lines of code, and produce even more errors...

I suggest to remove setErrorHandler and catch exceptions in UnitTests. As a result, unittests should complete as desired.

I'm novice in LO dev, so my suggestion can be wrong. I'll appreciate any advice.
Comment 6 Michael Stahl (allotropia) 2017-08-14 10:36:48 UTC
> In brief, this bug relates to UnitTests for unoxml module which are failing with unhanded exception.

i don't see where unhandled exceptions are causing test failures?

> It's expected that these function called in place of exception and if so, unittests will complete correctly.

no, the idea is that the functions are called *in addition to* any exceptions that may be thrown at a later point (such as the location you cited)
in case the XML document cannot be parsed.

i just wanted to point out that the callback functions
error_func() and warning_func() *themselves* must not throw exceptions,
because they are called by libxml's C code which cannot deal with exceptions.
Comment 7 Fyodor 2017-08-15 00:16:44 UTC
Michael, thank you for your comment
After digging to code I found out the same things as you wrote. Exceptions thrown only in case of error in parsed xml file. In case of warning exception is not thrown (and CDocument instance created), but xml2cmp calls  warning handler function. 
Now I think it's better to catch exception in unit test and also use m_xErrorHandler member functions to count warning/error count during parse.
(I'm not sure about distinguishing between error and fatal error, so may be I should remove separate fatal error unit test)
Comment 8 Fyodor 2017-08-21 06:26:31 UTC
I've submitted fix for this bug on gerrit https://gerrit.libreoffice.org/#/c/41376/
Comment 9 Commit Notification 2017-08-25 10:46:30 UTC
Fyodor Yemelyanenko committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d181d8acbf49e2fe87c8cf53a9431e503ccced55

tdf#84237 use XErrorHandler in CDocumentBuilder

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Michael Stahl (allotropia) 2017-08-25 11:35:02 UTC
thanks Fyodor for fixing this
Comment 11 Jack 2018-12-20 09:40:12 UTC Comment hidden (spam)