Bug 66279 - MathML export: use the operator dictionary
Summary: MathML export: use the operator dictionary
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Formula Editor (show other bugs)
Version:
(earliest affected)
4.2.0.0.alpha0+ Master
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 66088 MathML
  Show dependency treegraph
 
Reported: 2013-06-27 21:29 UTC by Frédéric Wang
Modified: 2019-07-15 02:47 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Patch (41.85 KB, patch)
2013-07-04 17:08 UTC, Frédéric Wang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frédéric Wang 2013-06-27 21:29:10 UTC
Currently if you export something like "x+y+z" to MathML, LibreOffice will essentially attach stretchy="false" attributes on each <mo>+</mo>. The default stretchiness of operators is given by the MathML operator dictionary:

http://www.w3.org/TR/MathML3/appendixc.html

and in most cases an explicit stretchy attribute is not necessary. Hence LibreOffice should not attach such an attribute in those cases. See also the comment in SmXMLExport::ExportNodes.
Comment 1 Jorendc 2013-06-27 21:36:22 UTC
NEW :)
Comment 2 Frédéric Wang 2013-06-27 21:55:31 UTC
Mass changes to assign bugs to myself.
Comment 3 Frédéric Wang 2013-06-30 16:14:02 UTC
So after analysis of the code, I think what is really needed by the MathML export is:

1) a way to know whether a character is an operator: this is necessary for the NSPECIAL %xxxx commands. See bug 66088 comment 5

2) a way to know whether an operator is stretchy: this is indicated in SmXMLExport::ExportNodes, case NMATH. Currently the code always adds a stretchy="false" attribute if no explicit attribute is specified yet.

Operators have three forms (prefix, postfix, infix) and the stretchiness in most cases is the same but in some cases it is different. I propose to add a (hash) table for the op dict with boolean values indicating whether a given operator has one of its forms "stretchy". 

For 1) the NSPECIAL will generate <mo> elements if the operator is in the dictionary and <mi> elements otherwise.

For 2) the code will adds a stretchy="false" attribute if no explicit attribute is specified yet and if the operator has a stretchy form. For example "+" is never stretchy and we don't need to specify stretchy="false" explicitly. Other operators that may be stretchy but for which we Math didn't ask for stretchiness will have stretchy="false" to prevent them from stretching.
Comment 4 Frédéric Wang 2013-07-02 08:58:18 UTC
I've submitted a patch for review:

https://gerrit.libreoffice.org/#/c/4671/

Testcases:

1) "widehat xxxxxxx" and "left( x right)"

   should still produce an <mo stretchy="true"> as the attribute is explicitly set. (I assume bug 66282 is fixed)

2) "( x )" should still produce an <mo stretchy="false"> as the attribute is explicitly set.

3) 4) "uoper %alpha x" should still produce an <mo> element (NGLYPH_SPECIAL uoper defines an unary mo).

4) \( %alpha + %beta + %gamma \) %noelement %SIGMA

   The '(', ')', '+', and '∉' are operators and should be <mo> elements. '(' and ')' are defined stretchy in the opdict, so a stretchy="false" should be attached to prevent it from stretching. The other operators don't need this attribute as they are not stretchy.

  The greek letters are not operators, they should be <mi> elements.

4) More subtle: use tools => catalog to define your own SPECIAL %mycommand. You can use an operator from the dictionary (stretchy or not) or another character that is not an operator. A <mi>, <mo> or <mo stretchy="false"> should be produced accordingly.
Comment 5 Frédéric Wang 2013-07-02 09:15:17 UTC
BTW: the current code does not seem to handle non-BMP characters. See bug 66333 for parsing issues.
Comment 6 Frédéric Wang 2013-07-03 06:56:29 UTC
I was not really happy with the way this generated file is handled and I agree that should be better integrated with the rest of the build system. One reason why I hesitated doing so is that I didn't want to break anything because of missing dependencies. It would help if you could recommend what I should use:

1) To download the unicode.xml file. I've seen that the build system already downloads some files. I've used "wget".

2) To extract the data from the XML file. I've used "xsltproc" but I guess any XSLT processor would work. unicode.xml is big, ~5Mb so I need an efficient way (fast, not memory consuming)

3) To format the output file. I've used classical UNIX tools like sed, grep, uniq and diff. Will that work with e.g. Windows build system? Otherwise, I can write a small Perl or Python script to do that.
Comment 7 Khaled Hosny 2013-07-04 15:08:48 UTC
1) I think you should use the existing mechanism for downloading third party sources, see Makefile.fetch and download.lst.

2) I don’t know much about XSLT, but check how solenv/bin/createcomponent.xslt is used.

2) We use cygwin for Windows build, so those tools are available, you probably need to use autoconf variables set for those tools to avoid path issues.
Comment 8 Frédéric Wang 2013-07-04 17:08:34 UTC
Created attachment 82035 [details]
Patch

(In reply to comment #7)
> 1) I think you should use the existing mechanism for downloading third party
> sources, see Makefile.fetch and download.lst.

Thanks, that seems easy to do (wget is used too BTW)

> 
> 2) I don’t know much about XSLT, but check how
> solenv/bin/createcomponent.xslt is used.

Cool, xsltproc is used too... no need for additional dependencies :-)

> 
> 2) We use cygwin for Windows build, so those tools are available, you
> probably need to use autoconf variables set for those tools to avoid path
> issues.

Great, that will make the things much easier.

--

I attach a WIP patch that moves the Operator dictionary into a separate C++ file, so that should be cleaner. Now the hard part is to understand the build system :-)
Comment 9 Khaled Hosny 2013-07-05 08:51:43 UTC
Comment on attachment 82035 [details]
Patch

>--- a/Makefile.fetch
>+++ b/Makefile.fetch
>@@ -192,6 +192,7 @@ $(WORKDIR)/download: $(BUILDDIR)/config_host.mk $(SRCDIR)/download.lst $(SRCDIR)
> 		$(call fetch_Optional,MOZ,$(MOZ_ZIP_LIB)) \
> 		$(call fetch_Optional,MOZ,$(MOZ_ZIP_RUNTIME)) \
> 	,$(call fetch_Download_item,http://dev-www.libreoffice.org/mozilla,$(item),no-sum))
>+	$(call fetch_Download_item,http://www.w3.org/2003/entities/2007xml,$(W3C_UNICODE),no-sum)

I think the file (being a build dependency) should be uploaded to LibreOffice servers to make sure it is always available (until it is uploaded, you can just copy it to src for your local build to work).

>--- a/download.lst
>+++ b/download.lst
>@@ -94,3 +94,5 @@ export ZLIB_TARBALL := 2ab442d169156f34c379c968f3f482dd-zlib-1.2.7.tar.bz2
> export MOZ_ZIP_INC := $(OS)$(COM)$(CPU)inc.zip
> export MOZ_ZIP_LIB := $(OS)$(COM)$(CPU)lib.zip
> export MOZ_ZIP_RUNTIME := $(OS)$(COM)$(CPU)runtime.zip
>+
>+export W3C_UNICODE := unicode.xml
>diff --git a/starmath/Library_sm.mk b/starmath/Library_sm.mk

It would be better to add checksum to the file as well, so that we can update it in the future easily.
Comment 10 Xisco Faulí 2017-09-29 08:53:54 UTC Comment hidden (obsolete)
Comment 11 Regina Henschel 2017-12-22 20:35:18 UTC
The issue is not solved in Version: 6.1.0.0.alpha0+ (x64)
Build ID: d73857e7d7f6a5bf38c6a2f396832faabaef65e2
CPU threads: 8; OS: Windows 10.0; UI render: GL; 
TinderBox: Win-x86_64@62-TDF, Branch:master, Time: 2017-12-12_17:37:14
Locale: de-DE (de_DE); Calc: CL
Comment 12 QA Administrators 2019-07-15 02:47:46 UTC
Dear Frédéric Wang,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug