Bug 66279 - MathML export: use the operator dictionary
Summary: MathML export: use the operator dictionary
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Formula Editor (show other bugs)
Version:
(earliest affected)
4.2.0.0.alpha0+ Master
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 66088 MathML
  Show dependency treegraph
 
Reported: 2013-06-27 21:29 UTC by Frédéric Wang
Modified: 2018-07-14 18:17 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Patch (41.85 KB, patch)
2013-07-04 17:08 UTC, Frédéric Wang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frédéric Wang 2013-06-27 21:29:10 UTC
Currently if you export something like "x+y+z" to MathML, LibreOffice will essentially attach stretchy="false" attributes on each <mo>+</mo>. The default stretchiness of operators is given by the MathML operator dictionary:

http://www.w3.org/TR/MathML3/appendixc.html

and in most cases an explicit stretchy attribute is not necessary. Hence LibreOffice should not attach such an attribute in those cases. See also the comment in SmXMLExport::ExportNodes.
Comment 1 Jorendc 2013-06-27 21:36:22 UTC
NEW :)
Comment 2 Frédéric Wang 2013-06-27 21:55:31 UTC
Mass changes to assign bugs to myself.
Comment 3 Frédéric Wang 2013-06-30 16:14:02 UTC
So after analysis of the code, I think what is really needed by the MathML export is:

1) a way to know whether a character is an operator: this is necessary for the NSPECIAL %xxxx commands. See bug 66088 comment 5

2) a way to know whether an operator is stretchy: this is indicated in SmXMLExport::ExportNodes, case NMATH. Currently the code always adds a stretchy="false" attribute if no explicit attribute is specified yet.

Operators have three forms (prefix, postfix, infix) and the stretchiness in most cases is the same but in some cases it is different. I propose to add a (hash) table for the op dict with boolean values indicating whether a given operator has one of its forms "stretchy". 

For 1) the NSPECIAL will generate <mo> elements if the operator is in the dictionary and <mi> elements otherwise.

For 2) the code will adds a stretchy="false" attribute if no explicit attribute is specified yet and if the operator has a stretchy form. For example "+" is never stretchy and we don't need to specify stretchy="false" explicitly. Other operators that may be stretchy but for which we Math didn't ask for stretchiness will have stretchy="false" to prevent them from stretching.
Comment 4 Frédéric Wang 2013-07-02 08:58:18 UTC
I've submitted a patch for review:

https://gerrit.libreoffice.org/#/c/4671/

Testcases:

1) "widehat xxxxxxx" and "left( x right)"

   should still produce an <mo stretchy="true"> as the attribute is explicitly set. (I assume bug 66282 is fixed)

2) "( x )" should still produce an <mo stretchy="false"> as the attribute is explicitly set.

3) 4) "uoper %alpha x" should still produce an <mo> element (NGLYPH_SPECIAL uoper defines an unary mo).

4) \( %alpha + %beta + %gamma \) %noelement %SIGMA

   The '(', ')', '+', and '∉' are operators and should be <mo> elements. '(' and ')' are defined stretchy in the opdict, so a stretchy="false" should be attached to prevent it from stretching. The other operators don't need this attribute as they are not stretchy.

  The greek letters are not operators, they should be <mi> elements.

4) More subtle: use tools => catalog to define your own SPECIAL %mycommand. You can use an operator from the dictionary (stretchy or not) or another character that is not an operator. A <mi>, <mo> or <mo stretchy="false"> should be produced accordingly.
Comment 5 Frédéric Wang 2013-07-02 09:15:17 UTC
BTW: the current code does not seem to handle non-BMP characters. See bug 66333 for parsing issues.
Comment 6 Frédéric Wang 2013-07-03 06:56:29 UTC
I was not really happy with the way this generated file is handled and I agree that should be better integrated with the rest of the build system. One reason why I hesitated doing so is that I didn't want to break anything because of missing dependencies. It would help if you could recommend what I should use:

1) To download the unicode.xml file. I've seen that the build system already downloads some files. I've used "wget".

2) To extract the data from the XML file. I've used "xsltproc" but I guess any XSLT processor would work. unicode.xml is big, ~5Mb so I need an efficient way (fast, not memory consuming)

3) To format the output file. I've used classical UNIX tools like sed, grep, uniq and diff. Will that work with e.g. Windows build system? Otherwise, I can write a small Perl or Python script to do that.
Comment 7 Khaled Hosny 2013-07-04 15:08:48 UTC
1) I think you should use the existing mechanism for downloading third party sources, see Makefile.fetch and download.lst.

2) I don’t know much about XSLT, but check how solenv/bin/createcomponent.xslt is used.

2) We use cygwin for Windows build, so those tools are available, you probably need to use autoconf variables set for those tools to avoid path issues.
Comment 8 Frédéric Wang 2013-07-04 17:08:34 UTC
Created attachment 82035 [details]
Patch

(In reply to comment #7)
> 1) I think you should use the existing mechanism for downloading third party
> sources, see Makefile.fetch and download.lst.

Thanks, that seems easy to do (wget is used too BTW)

> 
> 2) I don’t know much about XSLT, but check how
> solenv/bin/createcomponent.xslt is used.

Cool, xsltproc is used too... no need for additional dependencies :-)

> 
> 2) We use cygwin for Windows build, so those tools are available, you
> probably need to use autoconf variables set for those tools to avoid path
> issues.

Great, that will make the things much easier.

--

I attach a WIP patch that moves the Operator dictionary into a separate C++ file, so that should be cleaner. Now the hard part is to understand the build system :-)
Comment 9 Khaled Hosny 2013-07-05 08:51:43 UTC
Comment on attachment 82035 [details]
Patch

>--- a/Makefile.fetch
>+++ b/Makefile.fetch
>@@ -192,6 +192,7 @@ $(WORKDIR)/download: $(BUILDDIR)/config_host.mk $(SRCDIR)/download.lst $(SRCDIR)
> 		$(call fetch_Optional,MOZ,$(MOZ_ZIP_LIB)) \
> 		$(call fetch_Optional,MOZ,$(MOZ_ZIP_RUNTIME)) \
> 	,$(call fetch_Download_item,http://dev-www.libreoffice.org/mozilla,$(item),no-sum))
>+	$(call fetch_Download_item,http://www.w3.org/2003/entities/2007xml,$(W3C_UNICODE),no-sum)

I think the file (being a build dependency) should be uploaded to LibreOffice servers to make sure it is always available (until it is uploaded, you can just copy it to src for your local build to work).

>--- a/download.lst
>+++ b/download.lst
>@@ -94,3 +94,5 @@ export ZLIB_TARBALL := 2ab442d169156f34c379c968f3f482dd-zlib-1.2.7.tar.bz2
> export MOZ_ZIP_INC := $(OS)$(COM)$(CPU)inc.zip
> export MOZ_ZIP_LIB := $(OS)$(COM)$(CPU)lib.zip
> export MOZ_ZIP_RUNTIME := $(OS)$(COM)$(CPU)runtime.zip
>+
>+export W3C_UNICODE := unicode.xml
>diff --git a/starmath/Library_sm.mk b/starmath/Library_sm.mk

It would be better to add checksum to the file as well, so that we can update it in the future easily.
Comment 10 Xisco Faulí 2017-09-29 08:53:54 UTC Comment hidden (obsolete)
Comment 11 Regina Henschel 2017-12-22 20:35:18 UTC
The issue is not solved in Version: 6.1.0.0.alpha0+ (x64)
Build ID: d73857e7d7f6a5bf38c6a2f396832faabaef65e2
CPU threads: 8; OS: Windows 10.0; UI render: GL; 
TinderBox: Win-x86_64@62-TDF, Branch:master, Time: 2017-12-12_17:37:14
Locale: de-DE (de_DE); Calc: CL