Bug 50774 - FILEOPEN DOC/DOCX: automatic numbering in numbered lists different from Word numbering
Summary: FILEOPEN DOC/DOCX: automatic numbering in numbered lists different from Word ...
Status: RESOLVED DUPLICATE of bug 95848
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: DOCX-Limitations DOCX-Bullet-Number-Outline-Lists
  Show dependency treegraph
 
Reported: 2012-06-06 03:09 UTC by Timon
Modified: 2020-06-30 09:55 UTC (History)
14 users (show)

See Also:
Crash report or crash signature:


Attachments
Example 9.docx - original file, *.jpg - how document looks like in MS Office and LibreOffice (294.93 KB, application/x-zip-compressed)
2012-06-06 03:09 UTC, Timon
Details
Test Kit Works fine (28.71 KB, application/x-zip)
2012-12-01 17:01 UTC, Rainer Bielefeld Retired
Details
re Example 9.docx: Word 2010 doc version and Word screenshots (262.84 KB, application/zip)
2012-12-01 19:47 UTC, stfhell
Details
Word 2010 doc/docx files with numbered headings (1.12 MB, application/zip)
2012-12-01 21:57 UTC, stfhell
Details
Example 9 DOC and DOCX file, exported in pdf format in MSO 2007 SP3 and LibreOffice 4.2.0.0 beta 2 (484.04 KB, application/zip)
2013-12-16 09:57 UTC, Timon
Details
stripped-down file (41.96 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-03-27 07:09 UTC, Luke
Details
Misnumbering not occurring in LO 5.3.3.2 2017-06-16 (142.42 KB, image/png)
2017-06-15 22:41 UTC, Gabriel Bowater
Details
Bug 50774 - stripped4.doc: roundtripped by MS Word 2003 (53.00 KB, application/msword)
2018-05-24 04:24 UTC, Justin L
Details
Example 9 COMPARED.png (199.76 KB, image/png)
2019-09-06 08:27 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timon 2012-06-06 03:09:55 UTC
Created attachment 62659 [details]
Example 9.docx - original file, *.jpg - how document looks like in MS Office and LibreOffice

We have document with the following numeration of paragraphs in MS Office 2007 Service Pack 3

1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
2
2.1
2.2
3
3.1
3.2
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
5
and so on

If we open this document in LibO then numeration would be different

1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
2
1.8
1.9
1
3.1
3.2
4
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
3
and so on
Comment 1 Timon 2012-06-12 02:08:45 UTC Comment hidden (obsolete)
Comment 2 bfoman (inactive) 2012-06-22 03:51:03 UTC
Confirmed with:
LO 3.5.4.2 
Build ID: own W7 debug build
Windows 7 Professional SP1 64 bit

Autonumeration is different than in Word 2010.
Comment 3 Roman Eisele 2012-10-09 11:13:24 UTC
Confirmed also on Mac OS X 10.6.8 (Intel):
* With LibO 3.6.2.2., the TOC is even worse (most sub-headings are missing!)
* With a current master build, the TOC is again as in LibO 3.5.4.

The numbering problems are also visible in the “Navigator“ window, maybe even more impressive, as this window visualizes the headings hierarchy nicely.

First I had the suspicion that the .docx file itself was damaged, because the formatting (e.g. heading 3.0) seems somewhat inconsistent, but I opened in MS Office 2010 on Win 7 and can confirm that the TOC looks correctly (or, at least, much better) there.

Changed Summary -- this is about automatic numbering of headings, not of (ordinary/all) paragraphs, right?
Comment 4 Timon 2012-10-09 11:58:27 UTC
Right!

I can say, that in MS Office 2007 SP3 or even MS Office 2003 SP3 on Windows XP TOC looks correctly (even if we agree that you are right and file is damaged)
Comment 5 Timon 2012-10-09 12:28:50 UTC
Even if we try to save file in different format in MS Office with Save as... (for example, in Microsoft Word 97/2000/XP/2003 (.doc) format) nothing would change - in LibO and AOO there are problems with TOC, in MSO all is fine (even in 2003 with File Format Converters, 2007, 2010). So, most likely .docx file is OK, it is not damaged.
Comment 6 Roman Eisele 2012-10-09 14:01:42 UTC
(In reply to comment #5)
> So, most likely .docx file is OK, it is not damaged.

Agree! I did not want to slander your sample file ;-) I just considered the *possibility* that the file was somewhat damaged, because I had seen some similar bug before which made us all big headaches until we realized that a corrupted file was the reason. But anyway, if all these MS Office versions open the file correctly, and even when saved in different formats, LibreOffice should definitely open the file correctly, too -- so this is indeed an important bug.
Comment 7 Roman Eisele 2012-10-23 13:58:22 UTC
@ Writer experts:

Hello Cédric, Luboš, Michael, and Miklós,

this is another interesting problem in the import of .docx files. It exists (with some variation of the symptoms) since LibreOffice 3.3.0, so it is no regression, but nevertheless it would be well worth fixing this issue, because it might impair the use of LibreOffice for all advanced document issues. I say “might”, because right now we have only one sample file, but there is no obvious reason why it should not also happen in many other complex files ... Breaking the automatic numbering of headings is a nightmare e.g. for academic usage.

Thank you very much for looking into this issue!
Comment 8 Rainer Bielefeld Retired 2012-12-01 16:50:35 UTC
Same effect with AOOo 3.4.1, so seems inherited from OOo.

I can't reproduce the problem with my own docx sample documents created from writer, so
a) serious misunderstanding in LibO concerning docx outline numbering
b) reporter's sample document is damaged?
c) some special effect (localization) ... ?
d) something else?

That never worked, so I don't see this one as a 3.5 MAB, and remove it (also because lifecycle of 3.5 is terminated).

But indeed, I read concerning lots of difficulties to use LibO for academic use and docx document interchange. If this is a general problem that is very serious, and this Bug would be a good candidate for a HardHack. 
<http://wiki.documentfoundation.org/HardHacks>.

But for that nomination we should know some more concerning the details of this problem, for example we need a reliable sample document with known history (created with MS WORD) only created as a sample document

@all:
Can anybody create a test kit with a reliable sample document created from  WPRD?
Comment 9 Rainer Bielefeld Retired 2012-12-01 17:01:30 UTC
Created attachment 70885 [details]
Test Kit Works fine

I created the .dcx from the .odt, and there outline numbering works fine in MS WORD Viewer and "LibreOffice 3.6.4.3 rc" German UI/ German Locale [Build-ID: 2ef5aff] {pull date 2012-11-28} on German WIN7 Home Premium (64bit)

Can somebody find out the difference to reporter's sample with what I can reproduce the problem?
Comment 10 bfoman (inactive) 2012-12-01 19:04:49 UTC
There seems to be another new possible duplicate about TOC numbering - see bug 56798.
Comment 11 stfhell 2012-12-01 19:47:14 UTC
Created attachment 70891 [details]
re Example 9.docx: Word 2010 doc version and Word screenshots

Attached ZIP contains a doc version of Example 9.docx from , as produced by Word 2010. Also some screenshots done with Word.

After looking into the file, I would say that most problems come from inconsistent formatting. There are no list styles associated with the headings, all numbering is done through direct list formatting, and some are inserted as simple text. Word restarts numbering on certain level-1 paragraphs, and all the level-2 paragraphs continue accordingly (ch. 2 is followed by ch. 2.1, 2.2 etc.). But some of the level-1 paragraphs which restart numbering are not the headings themselves, but hidden paragraphs...

The bugs in LO: LO does not respect the restart of numbering; and it numbers level-1 and level-2 paragraphs indepedently. (Import of list formats from DOC is not very good in OO/LO. You can see that LO creates for some numbered paragraphs automatic list styles; others are numbered with direct formatting.)

Screenshot 1: first doc pages, with level-2-numbering shown in grey fields. The non-grey ones are simple numbers in the text.
Screenshot 2: Chapters 1 and 2 with formatting revealed. Just before chapter 2.1, a new list is started (staring with value 2).
Screenshot 3: Same text, with empty paragraphs and the (not automatically numbered) chapter-2-heading removed.
Screenshot 4: Same as screenshot 3, but without "formatting revealed". You cannot see the hidden paragraph starting a new list (2.1, 2.2 etc.).

You can see the hidden paragraphs in LO (if that is enabled) _after_ chapter 2 and _before_ chapter 3.1. They restart numbering on level 1 with "2" and "4", respectively (in Word); the level-2-numbering adjust to that and continue with "2.1" and "4.1", respectively. LO does not take account of the restarted numbering, and the level-2 numbering is independent from the level-1 numbering.

Example 9's way of restarting numbered lists is a bit eccentrical... It would be interesting to know if the document was hacked together like that by a user not wanting to deal with automatic numbering properly, or if Word did some of that on its own.
Comment 12 stfhell 2012-12-01 19:58:21 UTC
Changed the summary: Problem is not restricted to DOCX but also concerns DOC import. And it's not only outlines which are concerned. If numbering had been done via headings in the test file, it would probably have worked better.
Comment 13 stfhell 2012-12-01 21:57:26 UTC
Created attachment 70894 [details]
Word 2010 doc/docx files with numbered headings

Here are some files with documents created from scratch under Word 2010. Text was written, paragraph styles applied, then numbering using the Word "Numbering" ribbon button; no list styles. File sets as DOC, DOCX and PDF. LO is very MS compatible here:

File set 1: Word defaulted to a mixed numbering style (level 1: numbers; level 2: letters) and kept all numbered paragraphs in its own numbering group.

File set 2: Changed level 2 to number instead of letter.

File set 3: Wanted to add a numbered list as body text and used same numbering button. Word continues numbering the fresh list with the heading numbering.

File set 4: So we "restart" the body text list numbering... (4a:) Word does it, but of course the next heading continues the numbering ("5") - and in addition to that, the following level-2 list jumps back to letters. (4b:) I can restart numbering of heading 2, and I can set level 2 list format to number again; but I didn't manage to get heading 2.1 start with "1" again. This is where the Word user starts hacking...

File set 5: File set 3 converted to multi-level numbering.

With all these files, LO shows the same numbering as Word, even with restarted numbering. When you open the files, it creates automatic styles WW8Num* which presumably reflect Word's numbering groups.
Comment 14 stfhell 2012-12-01 22:33:48 UTC
(re comment #13)
> With all these files, LO shows the same numbering as Word, even with
> restarted numbering. When you open the files, it creates automatic styles
> WW8Num* which presumably reflect Word's numbering groups.

With "Example 9", LO seems to get confused about this mapping to its styles. The "Heading 1" paragraph style is not on outline level 1, probably because there is no regular automatic h1 numbering in the file, as with the other headings (the numbers for h1 are mostly ordinary text; the restart of numbering for the headings happens in hidden paragraphs of style "List Paragraph"). LO associates the styles for headings 2 and 3 with chapter numbering, but not heading 1. If you correct that, the file looks a lot better. But LO keeps a single "numbering group" for what should be chapter 3 plus the following hidden paragraph that is supposed to restart numbering to 3. Both paragraphs are mapped to the same list style.

I'm tempted to say that it would be asking too much to translate such a mixture of numbering methods into something coherent, but it's noteworthy that ApacheComment #1 says that Symphony gets the numbering right:
https://issues.apache.org/ooo/show_bug.cgi?id=119840#c1
Comment 15 Rainer Bielefeld Retired 2012-12-02 08:04:03 UTC
I can confirm that "Lotus Symphony Release 3.0.1 Revision 20120110.2000" on German WIN7 Home Premium (64bit) shows headings in sample 2012-06-06 03:09 UTC, Timon correctly.

So it's a difficult decision:
- Is it that worth to invest time to do a fix for such strange documents?
- But if other free software is able to handle that, shouldn't LibO 
  be able, too?

@stfhell
Is it possible to split this into separate bugs with brief clear and simple bug descriptions? I'm a little overwhelmed with that enormous lot of samples
Comment 16 stfhell 2012-12-02 21:46:53 UTC
> @stfhell
> Is it possible to split this into separate bugs with brief clear and simple
> bug descriptions? I'm a little overwhelmed with that enormous lot of samples

The problem is I wouldn't know what to put exactly in the other bug reports. I was looking at how "Example 9" does its numbering and I was experimenting with autonumbering in Word to see where LO deviated and if there is any difference between DOC and DOCX import. I'm sorry for the lengthy comments but I couldn't make them shorter because I still don't know where exactly LO's problems start, I could only describe what I found in the file. Maybe it is in fact just one bug.

The samples show (I think):
(1) If autonumbering is used correctly in Word, LO can handle it.
(2) Autonumbering in Word (without list styles) is obviously bound to lead to messy files, at some stage.
Just ignore the files if you don't want to use them for any experiments.

> I can confirm that "Lotus Symphony Release 3.0.1 Revision 20120110.2000" on
> German WIN7 Home Premium (64bit) shows headings in sample 2012-06-06 03:09
> UTC, Timon correctly.
> So it's a difficult decision:
> - Is it that worth to invest time to do a fix for such strange documents?
> - But if other free software is able to handle that, shouldn't LibO 
>   be able, too?
 
I think it would be worthwhile. "Example 9" is a typical real-life Word document. People create such files, partly because autonumbering is more or less "forced" on them by AutoCorrect. It's a feature which _should_ be used with some consideration. I regularly get Word files from people, and with LO (or OO) I can often only guess what kind of numbering the authors had in mind. Sometimes I have to use MS Word Viewer to print a PDF as a reference for the original doc's numbering.

On import of DOC/DOCX, LO tries to convert Word's non-style autonumbering to a style-based numbering (the filter creates plenty of WW8 list styles). That is not a trivial thing to do, and I could imagine that this is the root of the problem.
Comment 17 Timon 2013-12-16 09:57:22 UTC
Created attachment 90828 [details]
Example 9 DOC and DOCX file, exported in pdf format in MSO 2007 SP3 and LibreOffice 4.2.0.0 beta 2

In LibreOffice 4.2.0.0.beta2 Build ID: 1a27be92e320f97c20d581a69ef1c8b99ea9885d things are much better, but only for DOC file format. In DOCX things got worse (partly numbering gone at all). Compare attached PDF's to see differences.

In "Example 9 DOC LO 4.2.pdf" all is fine till page 4, but at page 4 all is messed up again
2
1.8
1.9
2
3.1
3.2
4
3.1
3.2
and so on

In "Example 9 DOCX LO 4.2.pdf" problems are seen from the first page, we see only items 1, 2, 4, and so on and don't see 1.1, 1.2, and so on numeration at all.
Comment 18 Buovjaga 2014-10-20 07:12:49 UTC
Numbering works ok for me on Win 7 64-bit 4.3.2.2 and dev build Version: 4.4.0.0.alpha0+
Build ID: 3e2bd1e4022e25b77bcc8eba5e02c1adc57008a1
TinderBox: Win-x86@42, Branch:master, Time: 2014-10-16_01:04:13

Please test!
Comment 19 Timon 2014-10-20 11:29:31 UTC Comment hidden (obsolete)
Comment 20 Luke 2015-03-26 21:06:25 UTC
This is still an issue with the latest build:
Version: 4.5.0.0.alpha0+
Build ID: 4ee55eed6a34f6f061a0cd369a30afb464f9fa27
Comment 21 Luke 2015-03-27 07:09:13 UTC
Created attachment 114395 [details]
stripped-down file

I removed everything before and after the point where the outline heading was desync'd. It seems to be an outline style issue. The correct headings should be:

2 
2.1
2.2
3
3.1
3.2
4
Comment 22 meneerjansen00 2015-05-12 17:39:17 UTC Comment hidden (obsolete)
Comment 23 rpr 2015-05-13 07:05:57 UTC Comment hidden (obsolete)
Comment 24 Cor Nouws 2015-07-07 18:26:55 UTC
(In reply to meneerjansen00 from comment #22)
> Seems like a duplicate of this bug:
> https://bugs.documentfoundation.org//show_bug.cgi?id=76817

I don't think so.
The other bug is about numbering that is different when inserted in a docx.
This one is about numbering that looks different if opened in A or B
Comment 25 Gabriel Bowater 2017-06-15 22:41:55 UTC Comment hidden (obsolete)
Comment 26 Timur 2017-06-16 07:25:01 UTC
(In reply to Gabriel Bowater from comment #25)
> Created attachment 134050 [details]
> Misnumbering not occurring in LO 5.3.3.2 2017-06-16
> 
> Opening file in the current stable LO version doesn't produce the
> undesirable result show in earlier screenshots.

Wrong conclusion. Contents table looks ok, but numbering in text is not. Can be seen if contents table is updated.
Comment 27 Xisco Faulí 2018-05-09 08:43:34 UTC
After

author	Justin Luth <justin_luth@sil.org>	2018-01-12 20:44:06 +0300
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2018-01-15 13:57:29 +0100
commit 7201d157a2ff2f0a8b6bb8fa57e31871187cbc81 (patch)
tree 2eebe2d8f6cdacd102b79e52a081fa5471dbfec4
parent a6b69a9384801f77f4cc30a366a45561c28eab3e (diff)
tdf#76817 ooxmlimport: connect Heading to existing numbers

the attachment 114395 [details] has changed from being shown like 1.1 to 1.
@Justinm, I thought you might be interested in this issue...
Comment 28 Justin L 2018-05-24 04:24:03 UTC
Created attachment 142259 [details]
Bug 50774 - stripped4.doc: roundtripped by MS Word 2003

(In reply to Xisco Faulí from comment #27)
> commit 7201d157a2ff2f0a8b6bb8fa57e31871187cbc81 (patch)
> tdf#76817 ooxmlimport: connect Heading to existing numbers
I reverted this patch today.

But it is worth noting that .doc format looks the same way in LO. That suggests that this bug is an odd-ball example. Since MS and LO internally have very different numbering implementation's, it might not be possible to emulate this one.

So, no - I'm not really interested in spending more time in this minefield :-)
Comment 29 Timur 2019-09-06 08:27:13 UTC
Created attachment 153978 [details]
Example 9 COMPARED.png

fixed
Comment 30 Timur 2019-09-06 08:28:38 UTC

*** This bug has been marked as a duplicate of bug 95848 ***
Comment 31 Luke 2019-09-06 14:49:36 UTC
Verified FIXED in Version: 6.4.0.0.alpha0+ (x64)
Build ID: 396869e0e71bd33f5d962779abf72f35d01245e5

Thanks Michael for fixing this with your work in Bug 95848
Comment 32 Timon 2020-06-30 09:55:17 UTC
Still fixed only partially in Version: 6.4.5.2 (x64)
Build ID: a726b36747cf2001e06b58ad5db1aa3a9a1872d6

Only table of contents is shown absolutely right. 

But if we start scrolling through the contents of the document further, we will see many discrepancies

In table of contents we see

2.1	Назначение системы	7
2.2	Цели создания системы	7
3	ХАРАКТЕРИСТИКА ОБЪЕКТА АВТОМАТИЗАЦИИ	9

4.1	Перечень программ, их назначение и основные характеристики	10
4.2	Требования к способам обмена информацией и средствам связи для информационного обмена между компонентами системы и со смежными системами	10
4.3	Требования к численности и квалификации персонала системы, обеспечивающих администрирование и сопровождение системы, в том числе изменение конфигурации системы (адаптация под изменения в законодательстве и методиках расчета, создание отчетных форм и настройки обмена информацией с другими ИС)	11
4.4	Требования к обучению пользователей	12
4.5	Показатели назначения	12
4.6	Требования к надежности	13
4.7	Требования к безопасности	13
4.8	Требования к эргономике и технической эстетике	13
4.9	Требования к эксплуатации, техническому обслуживанию, ремонту и хранению компонентов системы	14
4.10	Требования к защите информации от несанкционированного доступа	14
4.11	Требования по сохранности информации при авариях	15
4.12	Требования к патентной чистоте

In document text on pages 4 and 5 we see

1.1 Назначение системы 
1.2 Цели создания системы
2 ХАРАКТЕРИСТИКА ОБЪЕКТА АВТОМАТИЗАЦИИ

3.1 Перечень программ, их назначение и основные характеристики
• Программа 1
• Программа 2
• Программа 3

3.2 Требования к способам обмена информацией и средствам связи для информационного обмена между компонентами системы и со смежными системами

3.3 Требования к численности и квалификации персонала системы, обеспечивающих администрирование и сопровождение системы, в том числе изменение конфигурации системы (адаптация под изменения в законодательстве и методиках расчета, создание отчетных форм и настройки обмена информацией с другими ИС)

3.4 Требования к обучению пользователей

3.5 Показатели назначения

3.6 Требования к надежности

3.7 Требования к безопасности

3.8 Требования к эргономике и технической эстетике

3.9 Требования к эксплуатации, техническому обслуживанию, ремонту и хранению компонентов системы

3.10 Требования к защите информации от несанкционированного доступа

3.11 Требования по сохранности информации при авариях

3.12 Требования к патентной чистоте

Numration again dancing as she pleases, but already in the document itself