Created attachment 198305 [details] forum-mso-en-11426.docx: most of the text is in a floating frame An undersized frame now displays even less text than before. (If the frame size is adjusted in Word and then round-tripped, it looks OK in LO). [Possibly accurately described as a regression, but certainly not worthy of name-blaming. It is hard to clear draw conclusions from a wrongly-sized frame after all...] Steps to reproduce 1.) open forum-mso-en-11426.docx Notice that there is hardly any text visible on the first page. The most recent change happened with 25.2 (backported to 24.2.5) commit 09978b049570e00eae863f36f834885fd6ea8fe3 Author: Michael Stahl on Tue Jun 25 15:22:56 2024 +0200 tdf#161721 sw: text formatting: TabOverMargin allow 55cm of crazy So perhaps this is a hint that TAB_OVER_MARGIN should be ignored for frames (and perhaps tables?). The frame wasn't even showing up at all until 7.6 commit f4f91c9e7ad9ad669f03fc9a09cd20fdfb53805b Author: Justin Luth on Fri Mar 10 12:45:35 2023 -0500 tdf#104394 writerfilter: no addDummyParaForTable when PrevFramed
Created attachment 198308 [details] forum-mso-en-11426.docx_mso.pdf: how it looks in Word 2019 This document's design is bordering on ridiculous. The contents of the frame would fill 3 pages - except that a textbox is not allowed to span pages. The table itself is empty and seems to serve no purpose. At first I thought my version of MS Word 2010 simply isn't displaying what the author intended, but it is a compat14 document, which means it is designed exactly for Word 2010 (and it looks the same in Word 2019). The framePr textbox has no width specified no w:w so the frame is correctly tagged in LO as "autosize". But since it is paragraph-positioned based on an anchor in the first table cell, it thinks it needs to restrict its size to the width of the column. [By changing the anchor "to page" the frame autosizes correctly.] So ultimately the problem here seems to be that the frame (which is all defined before the table starts) is mistakenly anchored in the table instead of in a paragraph.
Created attachment 198310 [details] forum-mso-en-11426B.docx: revised version starting with an empty paragraph This has nothing to do with Dummy paragraphs at section starts etc.
Exploratory WIP patch at https://gerrit.libreoffice.org/c/core/+/179492
tdf104394_lostTextbox.docx from ooxmlexport18 is a key document to keep watching as you attempt to fix this. It will trip you up every time. For forum-mso-en-11426B.docx, it does not make any difference if we first force a convertToTextFrame with: case NS_ooxml::LN_tblStart: m_pImpl->CheckUnregisteredFrameConversion(); m_pImpl->ExecuteFrameConversion(); but it does work if we instead simply m_pImpl->AddDummyParaForTableInSection(); It doesn't matter if I delay ExecuteFrameConversion until after the table is done (by returning early if m_StreamStateStack.top().nTableDepth) - it still needs that paragraph at the beginning. So it is all somewhat magical how this is implemented. A problem is that at the start of the table, we have NO IDEA whether the table itself will be part of this same frame (like it is in tdf104394_lostTextbox.docx), or whether the cell paragraphs will detect a different frame configuration and make a call to CheckUnregisteredFrameConversion.
Created attachment 198516 [details] 164500_frameBeforeTable.docx: minimal, clean-room example - made with Word 2010
This is my understanding of the situation. We import the text and stick it into text nodes as simple paragraphs, tracking the start and end nodes (range) of the similarly framed paragraphs. At some point we notice the frame changed. Then we issue a convertToTextFrame. -the convert creates a frame -copies the text nodes into the frame -sets the anchor point to the start of the copy range -deletes the copied range. (So now the anchor point [which hasn't moved nodes] is pointing to the "next paragraph" after the now-deleted copy range.) At this point the next paragraph is also just a simple paragraph, since convertToTable has not yet been called. In these cases it is a textnode that will become the first paragraph of the first cell. But there is no way that unotext can know this at this point. And when we convertToTable, there is no reason to suppose that any anchored stuff is NOT supposed to be in a table. So I guess I need to just force a new node at the end of the framePr range. AFAICS, LO does not allow a frame to be anchored to a table startnode, but in MS Word that appears to be possible. So this is emulation which will end up with an additional paragraph (that might put things out of perfect alignment).
Justin Luth committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/575094675e7f6fb643e8cac61f06c14d2f79bcd5 tdf#164500 docx import framePr: add blank para as anchor before table It will be available in 25.8.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.