Bug 46716 - Some formulas are lost by "DOCX" import filter.
Summary: Some formulas are lost by "DOCX" import filter.
Status: RESOLVED INVALID
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Master old -3.6
Hardware: Other Windows (All)
: high critical
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: mab3.6
  Show dependency treegraph
 
Reported: 2012-02-28 03:29 UTC by ape
Modified: 2012-12-10 20:59 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
ZIP-file with the test files (576.84 KB, application/zip)
2012-02-28 03:29 UTC, ape
Details
LibO_Writer-3.3.4 (281.33 KB, application/vnd.oasis.opendocument.text)
2012-03-03 02:55 UTC, ape
Details
MinGW.. does the job correctly (47.40 KB, image/png)
2012-03-06 21:43 UTC, ape
Details
formula as OLE-object "formula" (26.22 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-03-09 21:47 UTC, ape
Details
Word-2010 and Writer-3.5.2 are open the DOCX-file (107.81 KB, image/png)
2012-03-12 01:39 UTC, ape
Details
Freeze Windows DE (331.18 KB, image/png)
2012-08-16 12:03 UTC, ape
Details
ODF instead of Flat XML (322.38 KB, application/vnd.oasis.opendocument.text)
2012-08-17 15:00 UTC, ape
Details
LibO-3.6.2rc2 reopen ODT-file (44.66 KB, image/png)
2012-09-28 21:27 UTC, ape
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ape 2012-02-28 03:29:27 UTC
Created attachment 57754 [details]
ZIP-file with the test files

The files "*.docx", containing formulas lose some formulas and do not open all the pages of the document (Windows - all OS; Ubuntu-11.04).
 1. Open the file "proekt_MU.docx".
 2. Open the file "proekt_MU.odt".
 3. Compare Documents.
 Remarks.
 A. File "proekt_MU.odt" received export to ODT-file format "proekt_MU.docx" by "MS_Word_2010".
 B. "MS_Word_2010" warning before exporting "DOCX"-file in the ODT-format:
 - Formulas will be converted to graphic objects (pictures);
 - Lost a footnote, if they were in formulas "DOCX"-file;
 - User can not edit the formulas in "ODT"-file.
Comment 1 ape 2012-03-03 02:55:50 UTC
Created attachment 57965 [details]
LibO_Writer-3.3.4

A blocking error: regression, the file is opened and saved as "*.odt" program LibreOffice-3.3.4
Comment 2 Ivan Timofeev (retired) 2012-03-04 00:37:53 UTC
Windows: LibO opens only the text above the first formula, duplicate of Bug 36982 - [FILEOPEN] Writer ignores text after Equation in docx ?
Linux: works fine with my own build of the libreoffice-3-5 branch, Build ID: 8727d28-f9b8d0b-2d9b003, OS: Ubuntu 10.10 x86. I did not test 3-5-1...
Comment 3 ape 2012-03-04 04:40:36 UTC
(In reply to comment #2)
> Windows: LibO opens only the text above the first formula, duplicate of Bug
> 36982 - [FILEOPEN] Writer ignores text after Equation in docx ?
Yes, Bug_46716 duplicate of Bug_36982 - [FILEOPEN] Writer ignores text after Equation in docx. Error repeated at LibO-3.4.x_Win_x86 to master~2012-02-29_04.21.51_LibO-Dev_3.6.0alpha0_Win_x86
Comment 4 ape 2012-03-04 07:54:37 UTC
LibreOffice Writer has a bug in the OS ALT Linux Centaurus 6.0 "Cheiron" (RPM) are the same as in Windows OS (post on the forum http://forumooo.ru/index.php/topic,2476.msg15576.html#msg15576 ).
 Results for Windows and ALT Linux Centaurus 6.0:
 A. LibO-3.3.4 loses some OLE-objects - formulas (by "п.16", "п.25" - long complex formulas), while continuing to work on opening the file.
 B. MS_Office-2010 reports on the transformation formulas in the image, but it puts all of the formulas as OLE-objects.
 C. OOo-3.1.1 makes inserting images, replacing the formulas.
 D. AOO-3.4b does not make the transformation formulas in the OLE-objects or pictures, and opens only the text of the document.
 E. LibO-3.5.0 opens the file, but he believes that the work is finished when he must convert the first formula.
 All indications are that LibreOffice-3.5.0 on Windows and AltLinux-6 can not start work of Libre_Math in the background to convert and paste formulas.
Comment 5 ape 2012-03-04 22:22:35 UTC
libreoffice-3-5~2012-03-04_15.10.50_LibO-Dev_3.5.2rc0_Win_x86
(/daily/Win-x86@7-MinGW)
It resolved this error (Windows XP_sp3; XP_64bit_sp2; Seven_32bit_sp1). All formulas are imported correctly.
Wonderful job!
Comment 6 Ivan Timofeev (retired) 2012-03-04 22:48:20 UTC
Yes, the MinGW build works fine in this case, but it will not be published as a release build. So the problem is not solved. And the official TDF Linux build (3.5.1 RC1) is really loses some formulas :( , as opposed to my own build. Let's do not close this bug for now.
Comment 7 ape 2012-03-06 08:05:08 UTC
An error has re-emerged in the distribution (libreoffice-3-5~2012-03-06_09.36.21_LibO-Dev_3.5.2rc0_Win_x86_install_en-US.msi).
Comment 8 ape 2012-03-06 20:38:01 UTC
LibO-3.5.1rc2_Windows. "Writer" opens only three pages File "proekt_MU.docx" to the first formula. This is a loss of information, so this is a blocking error.
Comment 9 ape 2012-03-06 21:43:24 UTC
Created attachment 58101 [details]
MinGW.. does the job correctly

The error may be in the UI. Writer can not open the formula, if the length of the formula more than the width of the page (points 16 and 25 by the file "proekt_MU.docx", attachment 57754 [details]).  This program (libreoffice-3-5~2012-03-04_15.10.50_LibO-Dev_3.5.2rc0_Win_x86; daily/Win-x86@7-MinGW) does the job correctly. See the attachment.
Comment 10 Ivan Timofeev (retired) 2012-03-06 22:27:39 UTC
(In reply to comment #8)
> LibO-3.5.1rc2_Windows. "Writer" opens only three pages File "proekt_MU.docx" to
> the first formula. This is a loss of information, so this is a blocking error.

I'd like to see Bug 36982 as a blocker instead, because it is old and well known, and discuss here the Linux problem. Do you agree?
Comment 11 ape 2012-03-07 01:12:20 UTC
(In reply to comment #10)
> I'd like to see Bug 36982 as a blocker instead, because it is old and well
> known, and discuss here the Linux problem. Do you agree?
Yes, of course.
 I'm using the "libreoffice-3-5~2012-03-04_15.10.50" (with Russian localization "3.5.1rc2") and ApacheOO-3.4b (Bug 46020 - Loss of footnotes when exporting to DOC-File). They solve my problems, so I can wait. But remains one question: "What other users?"
Comment 12 Petr Mladek 2012-03-08 06:00:35 UTC
Uh, I am a bit confused. For example, I have no idea what means the error mentioned in the comment #1. Also I am not sure if there is a hidden bug description in the comment #4.

I understand two bugs here. One is that some formulas are not read. It shows a blank square instead. The other bug is that it does not read the rest of the document after a particular formula.

Could you please open separate bug for the first problems (squares instead of formulas)? Please attach a sample document there and describe where the formula is missing (page number, screenshot), ...

Let's solve the second problem in the bug 36982 (lost data after an equtation).

I am sorry but I am going to close this bug as too confusing. Please, open separate bugs for other issues that I missed.
Comment 13 ape 2012-03-08 10:34:49 UTC
(In reply to comment #12)
> I understand two bugs here. One is that some formulas are not read. It shows a
> blank square instead. The other bug is that it does not read the rest of the
> document after a particular formula.

"ProektMU.docx" (attachment 57754 [details]) open with LibO_Writer-3.5.1rc2_Win_x86. You will see a document of three pages.
"ProektMU.docx" (attachment 57754 [details]) open with LibO_Writer-3-5~2012-03-04_15.10.50_LibO-Dev_3.5.2rc0_Win_x86. You will see a document of 44 pages.
I have to explain anything else?

> I am sorry but I am going to close this bug as too confusing.

You have the right to close the bag and give "LibO_Writer-3.5.1rc2" a new name "LibO_Writer-3.5.1", but formulas and pages are not return into the document.
Comment 14 ape 2012-03-08 11:47:34 UTC
libreoffice-3-5~2012-03-04_15.10.50_LibO-Dev_3.5.2rc0_Win_x86 (daily/Win-x86@7-MinGW) open the file (see att._bug_36982) correctly.
Comment 16 ape 2012-03-09 00:14:24 UTC
Errors can be seen on page 6 (par.16) and on page 9 (par.25) in the version for Linux, except for the two versions of the program, which I pointed out in comments 14 and 15. Error persists in these versions of herself. Posts that Bug_46176 and Bug_36982 are fixed was not.
Comment 17 Petr Mladek 2012-03-09 03:17:16 UTC
I know that there is a bug that it does not read many pages. It is being solved as the bug #36982, so we do not need to have opened this bug because of it. Or do you think that the pages here are not read because of different reason?

It does not help that MinGW build loads the page correctly. We do not use MinGW for producing the release build. So this information is interesting but it does not say anything about how old the bug is.

Please, open another bug for the missing particular formulas page 6 (par.16) and on page 9 on Linux. We should newer discuss more bugs in one bug report. It just brings confusing.

So, tell me, why do we need to keep this messed bug opened?
Comment 18 ape 2012-03-09 04:40:39 UTC
(In reply to comment #17)
> I know that there is a bug that it does not read many pages. It is being solved
> as the bug #36982, so we do not need to have opened this bug because of it. Or
> do you think that the pages here are not read because of different reason?

 Yes, I think it's different errors. I think that there is an error opening the file in the algorithm.
 LibreOffice_Writer determined that some kind of formula is longer than the width of the page. Then it begins to open the file, creating a temporary ODT (in fact, not by name):
- LibO_Writer (Windows OS) stops when it sees the type of OLE-object as "formula";
- It ignores the long formula in Linux (no AltLinux-6) and creates an empty space.
 This specific work is the result of differences in operating systems.

----------------------------------------- 
> It does not help that MinGW build loads the page correctly. We do not use MinGW
> for producing the release build. So this information is interesting but it does
> not say anything about how old the bug is.

 Need to compile a separate "RC3" (Win_x86 only). Windows OS users will believe the program again in this case only. LibO-3.5.1RC2_Win_x86 is not necessary for the Windows user such that it is now.
Comment 19 ape 2012-03-09 21:47:10 UTC
Created attachment 58261 [details]
formula as OLE-object "formula"

(In reply to comment #12)
> I understand two bugs here.
> Let's solve the second problem in the bug 36982..

You are right. There are two different bugs.
For export "*.docx" with formulas. 
This file may contain two types of OLE-objects "formula": formula inherited from the "*.doc", and a formula created by the object editor formulas MSO-2010. 1. Word-2010 exports in the ODT file of the 1st type as OLE-object "image", and type 2 as an OLE-object "formula".
2. OOo-3.1.1 import Type 1 as the "image" and the 2nd as text.
3. LibO-3.3.4 import Type 1 as OLE-object "image", and type 2 is not able to export and leaves an empty space.
4. dev-lo-3.5.2rc2 imports both types correctly, but can not save the object of the open document after editing the file. 
Bug_36982 is the message about 1st type "formulas" - "formula as OLE-object IMAGE".
This post talks about the 2nd type of formulas - "formula as OLE-object FORMULA".
That's all.
Comment 20 ape 2012-03-12 01:39:06 UTC
Created attachment 58304 [details]
Word-2010 and Writer-3.5.2 are open the DOCX-file

You have almost solved the problem open the formula as the OLE-object formula in the form of the OLE Math object (libreoffice-3-5~2012-03-09_13.41.25_LibO-Dev_3.5.2rc0_Win_x86/Win-x86@6-fast; the file "16b.docx"). But there is one problem: it is necessary to delete a single character - is once an object that is created for some reason when you open the "16b.docx" (see the attachment). The "long-formula" will fail if the user does not delete the "delta". But this does not solve the problem with "proekt_MU", containing both types of formulas: by the Word-2003 and by the Word-2010.
Comment 21 Petr Mladek 2012-03-13 07:25:52 UTC
Closing this bug as too confusing. Please, open new bug as suggested at http://lists.freedesktop.org/archives/libreoffice/2012-March/028113.html
Comment 22 ape 2012-03-14 06:48:40 UTC
(In reply to comment #21)
> Closing this bug as too confusing. Please, open new bug as suggested at
> http://lists.freedesktop.org/archives/libreoffice/2012-March/028113.html

Мой английский и он-лайн (Гугл) переводчик не позволяют объяснить Вам одну простую вещь: LibreOffice не умеет создавать внутри ODT-архива папки с содержанием для всех OLE-объектов, которые были помещены в DOCX-файл, и создаёт эти папки только для некоторых объектов. Механизм пропуска мне не понятен, любой OLE-объект, не зависимо от его типа ("изображение" или "формула") может быть потерян. Чаще всего, теряются объекты, начиная со второго, если в DOCX-файле они расположены подряд, один за другим.
Но, используя DEV-LibO-3.5.2rc0 (Win-x86@7-MinGW), я нашёл обходной манёвр:
1. Открываем DOCX-файл, например, "proekt_MU.odt".
2. Сохраняем файл в FODT-формате.
3. Переоткрвыаем\перезагружаем FODT-файл.
4. FODT-файл сохраняем в формате "ODT".

P.S. Для меня вопрос снят. Как поступать дальше - это решение за программистами.

My English and Google Translate do not allow the interpreter to explain to you one simple thing: LibreOffice can not create within the ODT-archive folder with the contents of all OLE-objects, which were placed in DOCX-file, and creates these folders only for some objects. The mechanism of passage I do not understand, any OLE-object, regardless of its type ( "image" or "formula") may be lost. Most often, lost objects, starting with the second, if DOCX-file are located in a row, one after another.

 However, using DEV-LibO-3.5.2rc0 (Win-x86 @ 7-MinGW), I found a workaround:
 1. Open DOCX. 
 2. DOCX Save as FODT. 
 3. FODT reload. 
 4. FODT save as ODT. 

P.S. For me the issue is cleared. What to do next - is the solution for programmers.
Comment 23 Ivan Timofeev (retired) 2012-03-14 10:30:26 UTC
(In reply to comment #22)
That is already reported: https://bugs.freedesktop.org/show_bug.cgi?id=46142 
Thanks.
Comment 24 ape 2012-08-16 08:27:00 UTC
OS: Windows XP 64-bit (update 15.08.2012)
LibreOffice: 3.6.1.1(ID:4db6344);  3.7.0.0.alpha0+(ID:065b591)

Error returned and became even worse:

1. Open file "proekt_MU.docx" (see attachment 57754 [details]).

2. Save this file as "proekt_MU.odt":
 a) Error message: Opening too many windows.
 b) Libre Office is frozen.

3. Kill processes soffice.bin by Windows Task Manager.

4. Open file "proekt_MU.docx":
 a) Start Recovery "proekt_MU.docx" and then push Finish button.
 b) The file "proekt_MU.docx" is opening.

5. Save this file as "proekt_MU.fodt":
 a) Error message: Opening too many windows.
 b) Libre Office is frozen.

6. Kill processes soffice.bin by Windows Task Manager.

7. Look at the folder where the file resides. You will see that a new file "proekt_MU.fodt_0.odt".

8. Open the file "proekt_MU.fodt_0.odt" - OK.
Comment 25 Michael Meeks 2012-08-16 09:55:41 UTC
Can't reproduce using 3.6.1rc1 on Linux-x86; LibreOffice fails to hang and just saves the document nicely.

ape: it'd be lovely to have a stack-trace for the hang - ideally with debugging symbols, any chance of that ?
Comment 26 ape 2012-08-16 12:03:18 UTC
Created attachment 65649 [details]
Freeze Windows DE

(In reply to comment #25)
> Can't reproduce using 3.6.1rc1 on Linux-x86; LibreOffice fails to hang and just
> saves the document nicely.
> 
> ape: it'd be lovely to have a stack-trace for the hang - ideally with debugging
> symbols, any chance of that ?

Sorry, I do not know how to do it.
--
 I think that LibO make freeze DE OS (see attachment), so Linux does not have this problem. 
Usage: CPU ~3% (only 1 core); RAM ~178 MB (increased by ~12 MB); status bar shows that ~10% of work is saving.
--
ape
Comment 27 Petr Mladek 2012-08-16 15:24:41 UTC
It seems to be Windows specific. I see the problems described in the comment 24 with 3.6.1.1 build on Windows. I can't see it on Linux.

If I press "Enter" on the error window: "Opening too many windows", it is opened again and again. If I keep "Enter" pressed enough time, it saves the file in the end.

Andras, could you please find where the error windows is shown from?

I have added some Writer experts into CC. They might help to fix it once we know what code is affected.
Comment 28 Andras Timar 2012-08-17 12:34:48 UTC
This error occurs only when Microsoft Office is also installed. There may be an issue with the embedded OLE objects (formulas), because when I clicked on them, I also got "Too many windows open" twice, and "General OLE error" (this comes from LibreOffice). Microsoft Equation Editor ran in the background.
Comment 29 ape 2012-08-17 15:00:42 UTC
Created attachment 65700 [details]
ODF instead of Flat XML

(In reply to comment #28)
> This error occurs only when Microsoft Office is also installed. There may be an
> issue with the embedded OLE objects (formulas), because when I clicked on them,
> I also got "Too many windows open" twice, and "General OLE error" (this comes
> from LibreOffice). Microsoft Equation Editor ran in the background.
--
Yes, the process "EQNEDT32.EXE" is running. Task Manager can not kill him, because it auto runs again immediately.

1. What if the Microsoft Office 2007 is required to detect errors import and export of OOXML, which is very much?

2. LibreOffice is created and saved ODF file immediately when OOXML file save as Flat XML (see file "proekt_MU.fodt_0.odt" in the attachment). Why ODT but not FODT?

3. Maybe a message with the 24th and 29th should be allocated to the new bug, assigning it a "critical" rather than "blocking" and the bug 46716 still closed?
--
ape
Comment 30 Petr Mladek 2012-08-21 09:09:30 UTC
I haven't seen this with freshly created document with only two formulas => it is kind of specific for this document. It is also related to the installed MS Office => it does not always happen => it should not block the release => lowering the severity a bit.
Comment 31 ape 2012-08-21 12:39:22 UTC
(In reply to comment #30)
> I haven't seen this with freshly created document with only two formulas => it
> is kind of specific for this document. It is also related to the installed MS
> Office => it does not always happen => it should not block the release =>
> lowering the severity a bit.

I agree, you are right.

 About the MSO Formula Editor:
 1. I have uninstalled it from the Microsoft Office. Freezing LibreOffice stopped.
 2. Now DOCX file is saved as a bad ODT file (LibO). This file does not opened by Writer.
 3. Imports in FODT file is correct.

 Petr! Please makes split of the message from the 24th in a separate bug, if possible.
Comment 32 ape 2012-09-28 21:27:01 UTC
Created attachment 67839 [details]
LibO-3.6.2rc2 reopen ODT-file

Error is present in the LibO-3.6.2.2. You can convert the ODT file only via FlatXML.
Comment 33 ape 2012-10-03 04:54:18 UTC
(In reply to comment #32)
> Created attachment 67839 [details]
> LibO-3.6.2rc2 reopen ODT-file
> 
> Error is present in the LibO-3.6.2.2. You can convert the ODT file only via
> FlatXML.

See bug 55437 also.
Comment 34 Michael Meeks 2012-12-10 20:59:37 UTC
ape - this bug by now is sufficiently long and tangled that it serves little useful purpose except for consuming tons of developer time as they try to read through and tease out the issues involved.

I will close it - please leave it closed. Please file other issues (thanks for doing this for bug#55437) for any remaining issues: these should have brief, clear, single-issue descriptions - preferably with a minimal test-file to reproduce them: minimal means as small as possible to reproduce the issue :-)

That -really- helps us save developer time and gets your bugs fixed more quickly - I end up with just confusion reading the above. Of course, feel free to link back to this bug and/or re-use the existing attachments.

Thanks for persisting with this issue ! I have no helpful resolution for "too long, tangled and confusing" - so I use Invalid - your bug collection is of course not invalid - so don't be offended. As far as I can see at least one of the multiple underlying issues here was fixed - so we could use that. Please CC me on your new bugs. Thanks !