Bug 133560 - FILEOPEN DOCX Direct font name/formatting not read from docx file for empty paragraph (last paragraph only) (fine for DOC)
Summary: FILEOPEN DOCX Direct font name/formatting not read from docx file for empty p...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:24.2.0 target:7.6.0.0.beta2 ta...
Keywords:
: 134683 146513 149415 156503 (view as bug list)
Depends on:
Blocks: DOCX-Character
  Show dependency treegraph
 
Reported: 2020-05-31 17:21 UTC by dinospao132
Modified: 2023-12-07 19:25 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file with custom font set to the second empty paragraph (9.35 KB, application/vnd.oasis.opendocument.text)
2020-12-02 11:04 UTC, NISZ LibreOffice Team
Details
The example file saved to docx with current Writer (4.28 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-12-02 11:05 UTC, NISZ LibreOffice Team
Details
Screenshot of the original and the docx document side by side in Writer (144.44 KB, image/png)
2020-12-02 11:06 UTC, NISZ LibreOffice Team
Details
The docx example file in Word 2013 (25.27 KB, image/png)
2020-12-02 11:08 UTC, NISZ LibreOffice Team
Details
Last empty line with direct formatting (1.77 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-06-23 06:19 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dinospao132 2020-05-31 17:21:51 UTC
Description:
i was creating a new docx writer file. i was change liberation Mono to arial and i save the file. i opened it again and the change didn't here.

Steps to Reproduce:
1.create a new docx writer file
2.change liberation Mono to arial and put the save button
3.open the file again and the change isn't here

Actual Results:
liberation Mono is here

Expected Results:
Arial must here


Reproducible: Always


User Profile Reset: No



Additional Info:
[Information automatically included from LibreOffice]
Locale: el
Module: TextDocument
[Information guessed from browser]
OS: Windows (All)
OS is 64bit: no
Comment 1 Telesto 2020-05-31 19:16:24 UTC
Confirm for an empty document with font change
Version: 7.1.0.0.alpha0+ (x64)
Build ID: 83c4f86f22dc37269ac6a038fe7de053c42aad6e
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: en-US (nl_NL); UI: en-US
Calc: CL
Comment 2 Telesto 2020-05-31 19:17:06 UTC
Also in
Versie: 4.4.7.2 
Build ID: f3153a8b245191196a4b6b9abd1d0da16eead600
Locale: nl_NL
Comment 3 dinospao132 2020-06-06 02:11:28 UTC
Also in docx files in ubuntu linux and libre office version 6.4.3.2.
Comment 4 dinospao132 2020-06-12 08:53:24 UTC
also in 6.4.5.1. in windows and also with odf 1.3
Comment 5 NISZ LibreOffice Team 2020-12-02 11:04:54 UTC
Created attachment 167750 [details]
Example file with custom font set to the second empty paragraph
Comment 6 NISZ LibreOffice Team 2020-12-02 11:05:22 UTC
Created attachment 167751 [details]
The example file saved to docx with current Writer

Version: 7.2.0.0.alpha0+ (x64)
Build ID: 4e63ec27b69fa01ff610c894c9fbf05c377a6179
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: default; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: CL
Comment 7 NISZ LibreOffice Team 2020-12-02 11:06:25 UTC
Created attachment 167752 [details]
Screenshot of the original and the docx document side by side in Writer

In case of the docx file the empty paragraph gets the default font when reopened.
Comment 8 NISZ LibreOffice Team 2020-12-02 11:08:52 UTC
Created attachment 167753 [details]
The docx example file in Word 2013

However, Word opens the docx file correctly. The document.xml also contains the correct font setting.
This is really an import problem, similar to the one I linked (about font size).
Comment 9 Telesto 2022-01-05 12:40:08 UTC
*** Bug 146513 has been marked as a duplicate of this bug. ***
Comment 10 Timur 2023-04-03 13:10:43 UTC
*** Bug 149415 has been marked as a duplicate of this bug. ***
Comment 11 Timur 2023-04-03 13:11:14 UTC
*** Bug 134683 has been marked as a duplicate of this bug. ***
Comment 12 Justin L 2023-04-03 15:42:17 UTC
I've looked into this before (but I can't remember the bug number). Everything looks fine until ~DomainMapper_Impl calls RemoveLastParagraph.

The problem is (always) with xCursor->setString(OUString());
Comment 13 Justin L 2023-04-03 21:07:41 UTC
A total hack to is first add
+   xTextAppend->insertString(xCursor, OUString('\x200B')/*CHAR_ZWSP*/, false);

But of course unit tests fail all over the place when you do that.

You can't just delete the entire paragraph, because then you lose frames anchored to it, like tdf112287, although why they are anchored to a paragraph that is going to be deleted is beyond me.

I tried a bunch of random stuff, like trying to Exchange() the cursor, but all to no effect.
Comment 14 Mike Kaganski 2023-06-23 06:19:44 UTC
Created attachment 188061 [details]
Last empty line with direct formatting

A minimized example

The file has the default paragraph style with 48 pt font size; its four paragraphs are direct-formatted to have font size 12 pt; the first and the third paragraphs have a text, the second and the fourth (last) are empty. The last one is imported without any direct formatting into Writer, so has font size 48.

The problem arises from these facts:

1. During import, XParagraphAppend::finishParagraph(Insert) are called, which are implemented using SwXText::Impl::finishOrAppendParagraph, and the latter calls

    m_pDoc->getIDocumentContentOperations().AppendTextNode( *aPam.GetPoint() );
    // remove attributes from the previous paragraph
    m_pDoc->ResetAttrs(aPam);

so that there is always another (empty) paragraph after the finalized one;

2. During import, lcl_AddRange is called to create anchored text content; the start and end of it may reference the very end of the document (using xTextAppend->getEnd()) - i.e., that last (maybe empty, maybe extra) paragraph.

3. In many places, and in particular, in DomainMapper_Impl::~DomainMapper_Impl, DomainMapper_Impl::RemoveLastParagraph is called; and the latter uses one of the two techniques to remove that last paragraph:

3.1. It either obtains the paragraph's lang::XComponent interface, and calls its dispose (SwXParagraph::dispose), which eventually calls DocumentContentOperationsManager::DelFullPara;

3.2. Or it uses cursor to select 1 ch back, and replace the resulting selection with nothing.

#3.1 has an advantage of keeping the formatting of the remaining (second-to-last) paragraph, but DocumentContentOperationsManager::DelFullPara, among other things, removes all anchored objects, thus this mode is not used for the end-of-document (see e521930ea1c855c236efb67793e540d07c201d35);

#3.2 keeps the anchored objects, but needs workarounds to keep bookmarks, and destroys the remaining paragraph character formatting (one can see that interactively: have a couple of last empty paragraphs; format next-to-last paragraph to have large font size; not try to remove the very last paragraph - using Del from last, or Backspace from next-to-last, or selecting both and pressing del/backspace - in all cases, the formatting will be taken from the very last, and will remove the direct formatting of the next-to-last).

Let me try to use #3.1 also in the end-of-document case, by introducing code to move anchored objects to previous paragraph before calling XComponent::dispose. Indeed, it may happen that more processing could be needed, if more properties would happen to be bound to the very last extra paragraph.

An alternative way could be trying to clone next-to-last properties to the very last one, before using #3.2; but I don't know how to do that reliably in writerfilter, without converting style format into direct formatting (maybe a new UNO interface would be needed?).

Yet another approach would be a thorough audit of how lcl_AddRange is called, to understand if it's possible to avoid anchoring to a paragraph destined for removal.

And a fictional solution could include a rewrite writerfilter inside sw, to allow direct control of everything :D
Comment 15 Commit Notification 2023-06-24 08:19:43 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fc1b9ab2913bc8c2d8414b6d8de3ceed3910c5d8

tdf#133560: re-anchor objects, to use paragraph's dispose for bEndOfDocument

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Mike Kaganski 2023-06-24 08:22:20 UTC
I hope this to be fixed now. The change is rather intrusive; a backport to 7-6 is created, but I'm not sure it would be merged.
Comment 17 Commit Notification 2023-06-24 09:08:49 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/0e6ad9029b8e6b0e912610d2e446682a16ceb402

tdf#133560: re-anchor objects, to use paragraph's dispose for bEndOfDocument

It will be available in 7.6.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2023-07-06 16:51:40 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/ba07bfcda6b9f256f636708e52283be0f3a90c8a

tdf#133560: re-anchor objects, to use paragraph's dispose for bEndOfDocument

It will be available in 7.5.6.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Gabor Kelemen (allotropia) 2023-08-03 19:39:54 UTC
*** Bug 156503 has been marked as a duplicate of this bug. ***