59642 – Tools -> Import Formula does not work if a MathML file is selected using "All files (*.*)" file filter

Bug 59642 - Tools -> Import Formula does not work if a MathML file is selected using "All files (*.*)" file filter

Summary: Tools -> Import Formula does not work if a MathML file is selected using "All...

Status:	VERIFIED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Formula Editor (show other bugs)
Version: (earliest affected)	3.6.4.3 release
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:	target:4.2.0 target:4.1.0.1
Keywords:

Depends on:
Blocks:

Reported:	2013-01-21 00:42 UTC by Mike Kaganski
Modified:	2013-07-09 13:58 UTC (History)
CC List:	1 user (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Kaganski 2013-01-21 00:42:44 UTC

Tools -> Import Formula -> leave the file filter set to "All files (*.*)" -> select a .mml file -> Open.

Expected: the file should be imported.
Actual result: nothing happens, formula is not imported.

If the filter is explicitly set to "MathML 1.01 (*.mml)", then the export completes successfully.

Comment 1 Marcos Souza 2013-05-23 12:12:20 UTC

Reproducible in 4.1.0 alpha version in Linux too.

So, chaging to platform to All

Comment 2 Commit Notification 2013-06-17 14:09:53 UTC

Marcos Paulo de Souza committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ef4872f6c6a49878ea2fc6b3a4ea3cd9ad52de33

Fixes fdo#59642



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 3 Marcos Souza 2013-06-17 14:10:43 UTC

It's fixed in master!

Thanks for the report!

Comment 4 Commit Notification 2013-06-17 16:11:57 UTC

Marcos Paulo de Souza committed a patch related to this issue.
It has been pushed to "libreoffice-4-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=923b05ac37355886e7a20fdcb1ad9dcb198f7570&h=libreoffice-4-1

Fixes fdo#59642


It will be available in LibreOffice 4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 5 Mike Kaganski 2013-06-29 15:10:34 UTC

Hello Marcos,
Thank you for looking into this.

Sorry to say this, but I tried the 4.1.0.1 and the problem is still there... :( Maybe the fox will be in a second RC?

Comment 6 Marcos Souza 2013-06-29 17:10:03 UTC

Hi Mike,

Is strange because the fix was applied to rc1, as described here: https://wiki.documentfoundation.org/Releases/4.1.0/RC1

But, can you test the rc2 please? (when it's available)

Comment 7 Mike Kaganski 2013-06-30 00:20:52 UTC

I looked into the commits, and it seems that they just cannot fix this issue:

The fix covers yet another file marker: "<math:math ". But the problem is already present with files having "old" markers, like, say, Attachment 73352 [details] from Bug #59641. That file contains plain old "<math ".

Comment 8 Mike Kaganski 2013-06-30 07:06:06 UTC

By the way: the signatures there could be followed by any whitespace character, not only SPACE: that could also be TAB, LF or CR (see http://www.w3.org/TR/2008/REC-xml-20081126/#NT-S)

Comment 9 Marcos Souza 2013-07-02 11:15:52 UTC

Hi Mike!

How, you're right =D

There is one more option to fix the auto detection.

I'll fix this, thanks a lot for looking at this!

Comment 10 Marcos Souza 2013-07-03 18:35:16 UTC

Hi guys!

I looked at that mml file. This file was a little different from the others: it have the BOM marker: www.w3.org/International/questions/qa-byte-order-mark

Do you know if the LO has code to avoid this marker, or maybe handle this?

Comment 11 Mike Kaganski 2013-07-03 21:39:05 UTC

Hi marcos,

unfortunately, I don't know this,
but LO imports it correctly if the type is explicitly selected.

I would vote for local solution (checking for the two possible BOMs before searching for markers), and explaining this in comments, so that it would be possible to search for string "BOM" or "Byte Order Mark", and a person doing unification could benefit from this.

Comment 12 Mike Kaganski 2013-07-05 03:39:02 UTC

(In reply to comment #10)
> Do you know if the LO has code to avoid this marker, or maybe handle this?

Here is what I found:

http://docs.libreoffice.org/tools/html/classSvStream.html#ab1d78e3df7058ca99859cb43d3b227a6

sal_Bool SvStream::StartReadingUnicodeText(rtl_TextEncoding eReadBomCharSet)

If eReadBomCharSet==RTL_TEXTENCODING_DONTKNOW: read 16bit, if 0xfeff do nothing (UTF-16), if 0xfffe switch endian swapping (UTF-16), if 0xefbb or 0xbbef read another byte and check for UTF-8. 

If no UTF-* BOM was detected put all read bytes back. This means that if 2 bytes were read it was an UTF-16 BOM, if 3 bytes were read it was an UTF-8 BOM. There is no UTF-7, UTF-32 or UTF-EBCDIC BOM detection!
 
If eReadBomCharSet!=RTL_TEXTENCODING_DONTKNOW: only read a BOM of that encoding and switch endian swapping if UTF-16 and 0xfffe.

===
The pStrm is SvStream*, so everything seems as easy as adding this just before
line 316 of smdetect.cxx http://cgit.freedesktop.org/libreoffice/core/tree/starmath/source/smdetect.cxx#n316 :
pStrm->StartReadingUnicodeText(RTL_TEXTENCODING_DONTKNOW);
sal_uLong nBytesRead = pStrm->Read( aBuffer, nSize );

Comment 13 Mike Kaganski 2013-07-05 04:07:16 UTC

And by the way,

the testcase I mentioned in Comment 7 (Attachment 73352 [details]) have no <?xml?> in the beginning, so this will make checks in line 319 to fail:

if (0 == strncmp( "<?xml",aBuffer,nSize))

According to [1], the XML declaration is optional, so the testcase is well-formed. So I suppose, that the checks should be relaxed. However, this is questionable, because it may give false positives. What do you think?

[1] http://www.w3.org/TR/2008/REC-xml-20081126/#sec-prolog-dtd

Comment 14 Marcos Souza 2013-07-05 15:10:46 UTC

Hi Mike!

Thanks a lot for the comments!

I added a commit in gerrit: https://gerrit.libreoffice.org/#/c/4742/

Now it check if we have the <?xml, and if not, it's good too =D

And I avoided the BOM character, so I beleive it

Comment 15 Marcos Souza 2013-07-05 15:11:14 UTC

(In reply to comment #14)
> Hi Mike!
> 
> Thanks a lot for the comments!
> 
> I added a commit in gerrit: https://gerrit.libreoffice.org/#/c/4742/
> 
> Now it check if we have the <?xml, and if not, it's good too =D
> 
> And I avoided the BOM character, so I beleive it

it's fine now :)

Thanks a lot for looking at this!

Comment 16 Commit Notification 2013-07-05 22:04:40 UTC

Marcos Paulo de Souza committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=05530423d3cff1391769192a62ae470500978ee6

Solve one more issue of fdo#59642



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 17 Mike Kaganski 2013-07-06 00:19:27 UTC

Thank you for working on it! Will test as soon as it is available.

Comment 18 Mike Kaganski 2013-07-09 05:11:26 UTC

Verified fixed in Version: 4.2.0.0.alpha0+
Build ID: 4374e5c80525cd1a9d9ab04714ccbf2543a912ce

Thank you!

Comment 19 Marcos Souza 2013-07-09 13:58:09 UTC

Thanks for verifying Mike!

If you find more MathML files that Math can't handle, please open this bug again, or send me an email!