Bug 104827 - FILESAVE DOC: Justified Text with Section Breaks Incorrectly Exported
Summary: FILESAVE DOC: Justified Text with Section Breaks Incorrectly Exported
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:doc
Depends on:
Blocks: Section
  Show dependency treegraph
 
Reported: 2016-12-21 05:04 UTC by Luke
Modified: 2020-12-21 11:12 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Simple .doc file with justified text and section break (22.00 KB, application/msword)
2016-12-21 05:04 UTC, Luke
Details
Some file round-tripped missing carriage return. Looks wrong in Word (9.50 KB, application/msword)
2016-12-21 05:05 UTC, Luke
Details
Screenshot showing missing paragraph mark and poorly formatted text in Word. (108.25 KB, image/png)
2016-12-21 05:10 UTC, Luke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke 2016-12-21 05:04:10 UTC
Created attachment 129833 [details]
Simple .doc file with justified text and section break

The MS-DOC standard has a quirk where if you end a justified paragraph with a section break and no carriage return, the last line will be justified. (Normally last line is left aligned). For some reason, just before a section break (page or continuous), our exporter throws away the final return. As a result, justified .doc files with section breaks get damaged on round-trip when opened in MS Office.
Comment 1 Luke 2016-12-21 05:05:00 UTC
Created attachment 129834 [details]
Some file round-tripped missing carriage return. Looks wrong in Word
Comment 2 Luke 2016-12-21 05:10:04 UTC
Created attachment 129835 [details]
Screenshot showing missing paragraph mark and poorly formatted text in Word.

Justin L,
Is this export issue something you'd be interested in?
Comment 3 Xisco Faulí 2016-12-21 09:41:50 UTC
Confirmed in

Version: 5.4.0.0.alpha0+
Build ID: 5a20df55ff829978c880b22e0a1f32c35d0ba30f
CPU Threads: 4; OS Version: Linux 4.8; UI Render: default; VCL: gtk2; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group
Comment 4 Justin L 2016-12-21 14:07:39 UTC
This comes from calling ww8atr.cxx:ReplaceCr(0xc) in wrtww8.cxx:WriteText().

The relevant parts of ReplaceCr were most recently (2002-04-04) written by Caolán McNamara.  https://cgit.freedesktop.org/libreoffice/core/commit/?id=b88924a09b3932535afb177e8944fb354aacfa81

I'm not sure why ReplaceCr is used instead of just WriteChar(0xc).  It is not super-trivial to find the ?HoriOrient? position of the latest paragraph, so it isn't simple to wrap that in an if clause.  The comments here are mostly in German.

CC'd Caolán McNamara
Comment 5 Caolán McNamara 2016-12-21 15:10:58 UTC
14 years ago is a bit of a stretch to cc me on it :-), I don't have any special insights at this distance in time
Comment 6 QA Administrators 2017-12-22 03:35:42 UTC Comment hidden (obsolete)
Comment 7 Luke 2017-12-27 04:58:55 UTC
Still reproducible with Version: 6.1.0.0.alpha0+ (x64)
Build ID: bf8e8cf11bc0d60ab80f5b3420dc424aec2fa626

by roundtripping attachment 129833 [details] and opening in Word.
Comment 8 QA Administrators 2018-12-28 03:45:58 UTC Comment hidden (obsolete)
Comment 9 Luke 2019-01-01 00:32:09 UTC
Still Repro in Version: 6.3.0.0.alpha0+ (x64)
Build ID: 082144fa0fb2021cfb41494bb6eb5bf417e58ab1
Comment 10 Justin L 2020-11-10 15:01:21 UTC
Still repro in 7.1+

The comments are no longer in German - so that might help some of us.

Either my previous testing was completely wrong (most likely), or something else has changed. WriteText is NOT directly calling ReplaceCr for this document, but is calling wrtw8nds.cxx's MSWordExportBase::OutputSectionNode which is then calling ReplaceCr (which is ONLY used by GetExport().GetExportFormat() == MSWordExportBase::DOC).

Unfortunately, unit tests (like ww8export3's fdo53985.doc) show it isn't as simple as changing that ReplaceCr to a WriteChar.