Bug 48097 - FILEOPEN: different page content of .DOC from brreg.no due to page break information inside a table
Summary: FILEOPEN: different page content of .DOC from brreg.no due to page break info...
Status: RESOLVED DUPLICATE of bug 108233
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:doc
Depends on:
Blocks: DOC-Paragraph
  Show dependency treegraph
 
Reported: 2012-03-30 09:11 UTC by marius
Modified: 2020-06-18 07:28 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Form in MS Word document interpreted wrong (448.00 KB, application/msword)
2012-03-30 09:11 UTC, marius
Details
File in ODT format interpreted correctly (73.45 KB, application/vnd.oasis.opendocument.text)
2012-03-30 09:12 UTC, marius
Details
PDF, produced from first attachment using msWord 2007 (270.45 KB, application/pdf)
2012-06-27 01:40 UTC, sasha.libreoffice
Details
Form in MS Word document - unprotected (448.50 KB, application/vnd.ms-word)
2016-12-22 10:27 UTC, Timur
Details
compare MSO and LO (218.62 KB, image/jpeg)
2016-12-22 13:09 UTC, Timur
Details
tdf48097_pageBreakInTable.doc: simple unit test - page-break-before in column 1, paragraph 1 (24.50 KB, application/msword)
2020-05-08 13:36 UTC, Justin L
Details
tdf48097_TwoTables.odt: simple Writer document where two tables are separated by a page break (8.68 KB, application/vnd.oasis.opendocument.text)
2020-05-25 11:43 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description marius 2012-03-30 09:11:25 UTC
Created attachment 59284 [details]
Form in MS Word document interpreted wrong

The error is reproduced by downloading the document here (also attached BR-1010B.doc):

http://www.signform.no/dss/kundeweb/Blankettarkiv/vis/Download.asp?BlankettFilID=1192&KID=617&C=343219538BR-1010B

It is formatted so that the bottom text ("BR - 1010B - 2011   http://www.brreg.no
Blanketten er godkjent av Brønnøysundregistrene September 2011") and paging "Side 1 av 6") should be at the bottom, but in Libreoffice's formatting this is not correct.

Compare with the ODT version available (attached "BR-1010B.odt"):

http://www.signform.no/dss/kundeweb/Blankettarkiv/vis/Download.asp?BlankettFilID=1790&KID=617&C=343219538

All of the forms are available here (Bokmål/Nynorsk is different language versions):

and some in english:

http://www.brreg.no/english/forms/
Comment 1 marius 2012-03-30 09:12:18 UTC
Created attachment 59285 [details]
File in ODT format interpreted correctly
Comment 2 marius 2012-04-02 00:46:41 UTC
The author of the documents says that it can be caused by the use of two types of page breaks, of which only one is supported by LibreOffice.
Comment 3 sasha.libreoffice 2012-06-27 01:40:52 UTC
Created attachment 63504 [details]
PDF, produced from first attachment using msWord 2007
Comment 4 sasha.libreoffice 2012-06-27 01:44:26 UTC
Thanks for bugreport
Using first attachment, reproduced:
on Fedora 64 bit in 3.3.4, 3.5.4, 3.6.beta1
on Windows 7 32 bit in 3.5.2

Changing version to 3.3.4 as most early reproduced
Comment 5 sasha.libreoffice 2012-12-15 08:26:21 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2015-01-05 17:51:59 UTC Comment hidden (obsolete)
Comment 7 Buovjaga 2015-01-27 17:26:11 UTC
Confirmed.

Win 7 Pro 64-bit Version: 4.5.0.0.alpha0+
Build ID: 784d069cc1d9f1d6e6a4e543a278376ab483d1eb
TinderBox: Win-x86@62-TDF, Branch:MASTER, Time: 2015-01-25_23:07:36
Comment 8 QA Administrators 2016-02-21 08:36:42 UTC Comment hidden (obsolete)
Comment 9 Timur 2016-12-20 18:03:18 UTC
One of those pesky forms. I don't understand why footer was not used here, but.. 
Since breaks were mentioned, I add Justin here due to Bug 64372 for expert opinion.
Comment 10 Justin L 2016-12-21 16:25:48 UTC Comment hidden (no-value)
Comment 11 Timur 2016-12-22 10:27:36 UTC
Created attachment 129863 [details]
Form in MS Word document - unprotected

I hope this one is of some use.
Comment 12 Justin L 2016-12-22 10:43:31 UTC Comment hidden (obsolete)
Comment 13 Timur 2016-12-22 11:06:01 UTC Comment hidden (no-value)
Comment 14 marius 2016-12-22 11:29:46 UTC
Regarding the original problem, while the document might still be a useful use case if it is a document MS Office could happen to make, the form has been converted to PDF available here: https://www.brreg.no/wp-content/uploads/BR-1010B.pdf

Even better, in most cases, the form has been replaced by online web forms: https://www.brreg.no/produkter-og-tjenester/skjemakatalog/samordnet-registermelding-del-1-hovedskjema/
Comment 15 Timur 2016-12-22 13:09:41 UTC
Created attachment 129865 [details]
compare MSO and LO

Side by side comparison shows match, until the row "8.Organisasjonsform"
Paragraph-Line and Page Breaks status shows "Page break before" for that one in MSO, and that's what LO doesn't read properly.
Justin, please take a look.
Comment 16 Justin L 2016-12-24 16:30:51 UTC
LibreOffice does not support paragraphs with page-break information when they are inside of a table (at least according to the UI).
Comment 17 QA Administrators 2018-10-04 02:55:50 UTC Comment hidden (obsolete)
Comment 18 Timur 2018-10-04 07:03:43 UTC Comment hidden (obsolete)
Comment 19 Justin L 2018-10-10 07:05:12 UTC
Although a different situation, we have the opposite effect in bug 61423 - where the table IS being split due to a new page style (RES_PAGEDESC).
Comment 20 QA Administrators 2019-10-11 02:36:38 UTC Comment hidden (obsolete)
Comment 21 Timur 2019-10-11 07:37:46 UTC
Repro 6.4+
Comment 22 Justin L 2020-05-08 13:36:50 UTC
Created attachment 160535 [details]
tdf48097_pageBreakInTable.doc: simple unit test - page-break-before in column 1, paragraph 1

Microsoft ignores a paragraph's request for a page break before when in a table cell, UNLESS it is the first paragraph of a cell in column A.

Currently, LO imports that page-break request and move it to the table itself - which is the only supported place for it. The UI doesn't allow page-breaks or new page styles inside a table. This could be emulated by splitting the table and adding a page-break before on the table. That should be easily possible since apparently DOC tables are just equivalently configured rows grouped together.
Comment 23 Justin L 2020-05-09 11:42:06 UTC
(In reply to Justin L from comment #22)
> emulating with a split table should be easily possible...
Amateur remark. Easy and tables do not belong in the same sentence.
Comment 24 Justin L 2020-05-25 11:43:06 UTC
Created attachment 161257 [details]
tdf48097_TwoTables.odt: simple Writer document where two tables are separated by a page break

Although not exactly matching this bug document, this simple document composed in Writer (and so untainted by any previous MS formatting) demonstrates that LO importing is poor - since it merges the two tables into one. (Export is OK since MS Word 2003 opens up the .doc and .docx formats that LO produces properly - with one table on each page.)

This document was inspired by bug 104017 which should be a duplicate of this one.
Comment 25 Justin L 2020-05-26 08:38:37 UTC Comment hidden (obsolete)
Comment 26 Justin L 2020-05-26 08:42:43 UTC
(In reply to Justin L from comment #25)
> *** Bug 104017 has been marked as a duplicate of this bug. ***

I don't think it will actually be of much help, but this particular bug was "fixed" in LO 6.4 when a specific set of RES_PAGE_DESC page-styles were assigned. So it could be another avenue of emulation that could be explored, although one fraught with other difficulties.

In any case, make sure the fix designed works for that similar-but-different case.
Comment 27 Justin L 2020-06-18 07:28:46 UTC
I'm going to mark this as a duplicate of bug 108233, since that report is very clear, and indicates that it is POSSIBLE for ODF to handle per-row breaks. Only the UI needs to be extended to support that.

*** This bug has been marked as a duplicate of bug 108233 ***