Bug 146727 - Import strings that matches specific LO Math operators in MSO equations as text
Summary: Import strings that matches specific LO Math operators in MSO equations as text
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Formula Editor (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-13 00:30 UTC by Francisco
Modified: 2022-05-11 08:02 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
example file with broken text (13.31 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-01-13 00:31 UTC, Francisco
Details
Side by side comparison (34.14 KB, image/png)
2022-01-13 00:34 UTC, Francisco
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Francisco 2022-01-13 00:30:43 UTC
Description:
There are several cases when some strings that are Math operators are used as single chain strains or even variable emphasis.
Some examples are:
1) "bar" = common unit of measure for pressures, is also a command for overline variables
2) "cicle", "square", may be single words inside an equation.
3) "+", "*" and other operators are used for emphasis, like, for instance, a reference value.

In the attached DOCX file, there are several examples of that are not well imported. 

Steps to Reproduce:
1. Open attached file
2.
3.

Actual Results:
Equations 2 to 3 have been lost. If the file is saved by LO, and then opened and saved by MS Word, there will be data loss.

Expected Results:
All equations are correctly imported.


Reproducible: Always


User Profile Reset: No



Additional Info:
I don't know how "circle" and "bar" emphasis are handled within MSO, but I think that "bar", "circle", and other strings that matches Math operators should be imported as quoted text.

On the other hand, "+" and "*" should have been imported also as quotated text, although I'm not sure on how to solve this case. Maybe, when Math detects an error when importing from Word, it should import all as quoted text?
Comment 1 Francisco 2022-01-13 00:31:55 UTC
Created attachment 177512 [details]
example file with broken text

Look at equations 2 to 4: all of them are wrong or simple broken
Comment 2 Francisco 2022-01-13 00:34:38 UTC
Created attachment 177513 [details]
Side by side comparison

Comparison with LO 7.2.5.2. The import from older versions may be a little different for  Eq. 4, but wrong anyway.
Comment 3 Rafael Lima 2022-05-05 21:02:03 UTC
Hi Francisco, this is a real problem while importing formulas from MS Office files. For example, the circle area formula was imported as:

circle area = π {r} ^ {2}

But it should have been imported as:

italic "circle area" = π {r} ^ {2}

So whenever a formula from MS Office contains a reserved word for Math, it should be imported as italic quoted strings.

I am setting this to NEW.
Comment 4 Francisco 2022-05-06 14:06:00 UTC
Thank you, Rafael.

Just one comment

(In reply to Rafael Lima from comment #3)

> 
> But it should have been imported as:
> 
> italic "circle area" = π {r} ^ {2}
> 

It should be imported as 

"cicle area"

without italics; look at the original DOCX file. But this is actually a different bug, bug 146726.
Comment 5 Rafael Lima 2022-05-06 14:10:16 UTC
(In reply to Francisco from comment #4)
> It should be imported as 
> 
> "cicle area"
> 
> without italics; look at the original DOCX file. But this is actually a
> different bug, bug 146726.

Thanks for the pointer... indeed this should not be italic.
Comment 6 Rafael Lima 2022-05-06 14:13:10 UTC
@Dante you worked with the MathML parser some time ago. Do you have any good code pointer for this one?
Comment 7 dante19031999 2022-05-11 08:02:43 UTC
(In reply to Rafael Lima from comment #6)
> @Dante you worked with the MathML parser some time ago. Do you have any good
> code pointer for this one?

I can't help you much with this. I lack knowledge about how the information is encoded on MS Word.

About mathml, if you wanted to insert some stuff such as operator which are decorative you would add them as <mtext>=</mtext> which on LO should be "=".

The bug 1:
Formula:   {v} ^ {+}     =C {left ({y} over {R} right )} ^ {α}
Should be: {v} ^ {{}+{}} =C {left ({y} over {R} right )} ^ {α}
                   |  |
As you can the operators have not been escaped. This is a feature of LO I dislike because forces me to write much useless code.

The bug 2:
Formula:   ln {{p} ^ {vap} left[ bar   right] =A- {B} over {t left [°C right ] +C}}
Should be: ln {{p} ^ {vap} left[ "bar" right] =A- {B} over {t left [°C right ] +C}}
                                 |  |
As you can see a keyword has been treated as a variable name. This one is more tricky because it would be necessary add a command of the style mi "variable name" if you want to ensure the safety of variable name text.

The bug 3:
Formula:    circle area  = π {r} ^ {2}
Should be: "circle area" = π {r} ^ {2}
            |          |
As you can see here we have double bug. First text information has been treated as variable names. Secondly we have fallen victims to bug 2.

I hope this information has been useful to you.

https://developer.mozilla.org/en-US/docs/Web/MathML/Element/mtext
https://developer.mozilla.org/en-US/docs/Web/MathML/Element/mi