Bug 59642 - Tools -> Import Formula does not work if a MathML file is selected using "All files (*.*)" file filter
Summary: Tools -> Import Formula does not work if a MathML file is selected using "All...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Formula Editor (show other bugs)
Version:
(earliest affected)
3.6.4.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:4.2.0 target:4.1.0.1
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-21 00:42 UTC by Mike Kaganski
Modified: 2013-07-09 13:58 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2013-01-21 00:42:44 UTC
Tools -> Import Formula -> leave the file filter set to "All files (*.*)" -> select a .mml file -> Open.

Expected: the file should be imported.
Actual result: nothing happens, formula is not imported.

If the filter is explicitly set to "MathML 1.01 (*.mml)", then the export completes successfully.
Comment 1 Marcos Souza 2013-05-23 12:12:20 UTC
Reproducible in 4.1.0 alpha version in Linux too.

So, chaging to platform to All
Comment 2 Commit Notification 2013-06-17 14:09:53 UTC
Marcos Paulo de Souza committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ef4872f6c6a49878ea2fc6b3a4ea3cd9ad52de33

Fixes fdo#59642



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 3 Marcos Souza 2013-06-17 14:10:43 UTC
It's fixed in master!

Thanks for the report!
Comment 4 Commit Notification 2013-06-17 16:11:57 UTC
Marcos Paulo de Souza committed a patch related to this issue.
It has been pushed to "libreoffice-4-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=923b05ac37355886e7a20fdcb1ad9dcb198f7570&h=libreoffice-4-1

Fixes fdo#59642


It will be available in LibreOffice 4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 5 Mike Kaganski 2013-06-29 15:10:34 UTC
Hello Marcos,
Thank you for looking into this.

Sorry to say this, but I tried the 4.1.0.1 and the problem is still there... :( Maybe the fox will be in a second RC?
Comment 6 Marcos Souza 2013-06-29 17:10:03 UTC
Hi Mike,

Is strange because the fix was applied to rc1, as described here: https://wiki.documentfoundation.org/Releases/4.1.0/RC1

But, can you test the rc2 please? (when it's available)
Comment 7 Mike Kaganski 2013-06-30 00:20:52 UTC
I looked into the commits, and it seems that they just cannot fix this issue:

The fix covers yet another file marker: "<math:math ". But the problem is already present with files having "old" markers, like, say, Attachment 73352 [details] from Bug #59641. That file contains plain old "<math ".
Comment 8 Mike Kaganski 2013-06-30 07:06:06 UTC
By the way: the signatures there could be followed by any whitespace character, not only SPACE: that could also be TAB, LF or CR (see http://www.w3.org/TR/2008/REC-xml-20081126/#NT-S)
Comment 9 Marcos Souza 2013-07-02 11:15:52 UTC
Hi Mike!

How, you're right =D

There is one more option to fix the auto detection.

I'll fix this, thanks a lot for looking at this!
Comment 10 Marcos Souza 2013-07-03 18:35:16 UTC
Hi guys!

I looked at that mml file. This file was a little different from the others: it have the BOM marker: www.w3.org/International/questions/qa-byte-order-mark

Do you know if the LO has code to avoid this marker, or maybe handle this?
Comment 11 Mike Kaganski 2013-07-03 21:39:05 UTC
Hi marcos,

unfortunately, I don't know this,
but LO imports it correctly if the type is explicitly selected.

I would vote for local solution (checking for the two possible BOMs before searching for markers), and explaining this in comments, so that it would be possible to search for string "BOM" or "Byte Order Mark", and a person doing unification could benefit from this.
Comment 12 Mike Kaganski 2013-07-05 03:39:02 UTC
(In reply to comment #10)
> Do you know if the LO has code to avoid this marker, or maybe handle this?

Here is what I found:

http://docs.libreoffice.org/tools/html/classSvStream.html#ab1d78e3df7058ca99859cb43d3b227a6

sal_Bool SvStream::StartReadingUnicodeText(rtl_TextEncoding eReadBomCharSet)

If eReadBomCharSet==RTL_TEXTENCODING_DONTKNOW: read 16bit, if 0xfeff do nothing (UTF-16), if 0xfffe switch endian swapping (UTF-16), if 0xefbb or 0xbbef read another byte and check for UTF-8. 

If no UTF-* BOM was detected put all read bytes back. This means that if 2 bytes were read it was an UTF-16 BOM, if 3 bytes were read it was an UTF-8 BOM. There is no UTF-7, UTF-32 or UTF-EBCDIC BOM detection!
 
If eReadBomCharSet!=RTL_TEXTENCODING_DONTKNOW: only read a BOM of that encoding and switch endian swapping if UTF-16 and 0xfffe.

===
The pStrm is SvStream*, so everything seems as easy as adding this just before
line 316 of smdetect.cxx http://cgit.freedesktop.org/libreoffice/core/tree/starmath/source/smdetect.cxx#n316 :
pStrm->StartReadingUnicodeText(RTL_TEXTENCODING_DONTKNOW);
sal_uLong nBytesRead = pStrm->Read( aBuffer, nSize );
Comment 13 Mike Kaganski 2013-07-05 04:07:16 UTC
And by the way,

the testcase I mentioned in Comment 7 (Attachment 73352 [details]) have no <?xml?> in the beginning, so this will make checks in line 319 to fail:

if (0 == strncmp( "<?xml",aBuffer,nSize))

According to [1], the XML declaration is optional, so the testcase is well-formed. So I suppose, that the checks should be relaxed. However, this is questionable, because it may give false positives. What do you think?

[1] http://www.w3.org/TR/2008/REC-xml-20081126/#sec-prolog-dtd
Comment 14 Marcos Souza 2013-07-05 15:10:46 UTC
Hi Mike!

Thanks a lot for the comments!

I added a commit in gerrit: https://gerrit.libreoffice.org/#/c/4742/

Now it check if we have the <?xml, and if not, it's good too =D

And I avoided the BOM character, so I beleive it
Comment 15 Marcos Souza 2013-07-05 15:11:14 UTC
(In reply to comment #14)
> Hi Mike!
> 
> Thanks a lot for the comments!
> 
> I added a commit in gerrit: https://gerrit.libreoffice.org/#/c/4742/
> 
> Now it check if we have the <?xml, and if not, it's good too =D
> 
> And I avoided the BOM character, so I beleive it

it's fine now :)

Thanks a lot for looking at this!
Comment 16 Commit Notification 2013-07-05 22:04:40 UTC
Marcos Paulo de Souza committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=05530423d3cff1391769192a62ae470500978ee6

Solve one more issue of fdo#59642



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 Mike Kaganski 2013-07-06 00:19:27 UTC
Thank you for working on it! Will test as soon as it is available.
Comment 18 Mike Kaganski 2013-07-09 05:11:26 UTC
Verified fixed in Version: 4.2.0.0.alpha0+
Build ID: 4374e5c80525cd1a9d9ab04714ccbf2543a912ce

Thank you!
Comment 19 Marcos Souza 2013-07-09 13:58:09 UTC
Thanks for verifying Mike!

If you find more MathML files that Math can't handle, please open this bug again, or send me an email!