Created attachment 201846 [details] ooo19070-1_minimal.doc: red arrow indicator that not all text is shown Layout has increased the amount of space assigned to sequential spare characters, resulting in a layout that no longer matches what MS Word produces. In this minimized example, which has "Balance SBCS characters and DBCS characters" turned on, the table is now showing a red arrow indicating non-visible context exists. The date should show "August, 2003", but only "August, " is visible. This started with commit 6818bc55ff248c59f12b2e090139eff30fe949dd Author: Jonathan Clark on Wed Mar 26 14:28:55 2025 -0600 tdf#88908 sw: Add BalanceSpacesAndIdeographicSpaces compat option Reviewed-on: https://gerrit.libreoffice.org/c/core/+/183412 Found by Collabora's mso-test.
Confirmed.
Created attachment 201847 [details] ooo19070-1.doc: the original document exhibits similar problems in multiple ways There are multiple related issues that can be seen in the original document: 1.) MS Word 2003/2010 opens this with "Balance SBCS characters and DBCS characters" disabled, but we import it as enabled. (Round-tripping the document with MS Word "fixes" that problem.) 2.) [Now with MS Word having round-tripped the document after turning on "Balance SBCS characters and DBCS characters"...] In LO, the first page's content still has one line spill over to the next (otherwise empty) page. Note that prior to comment 0's identified patch, the paragraphs were overlaying each other almost perfectly.
Created attachment 201848 [details] ooo19070-1.doc_prev-import-1.png: basically perfect before comment 0's commit. RED=MS Word
Created attachment 201849 [details] ooo19070-1.doc_import-1.png: after comment 0's commit. RED=Word2019
Created attachment 201876 [details] forum-mso-en-6216.doc: page 1 is an example I'm find lots of documents affected by this commit. This one is legitimately a "Balance SBCS...". The first page should be full of text (including a footnote). bug 114629's attachment 138570 [details] is another (poor, but balanced) example (top of page 4).
Created attachment 201895 [details] Screenshot of ooo19070-1_minimal in Word 360 Screenshot of the minimal sample (ooo19070-1_minimal.doc) from the newest version of Microsoft Word, at time of writing. Note that the text overflows. In LO it overflows nicely with an indicator showing there is more text, but in Word it overflows to a second line that is clipped by the border.
I think there are two separate issues to discuss: (In reply to Justin L from comment #0) > Created attachment 201846 [details] > ooo19070-1_minimal.doc: red arrow indicator that not all text is shown > > Layout has increased the amount of space assigned to sequential spare > characters, resulting in a layout that no longer matches what MS Word > produces. See attachment 201895 [details]. When I open this file in new versions of Microsoft Word, I see the same overflow we now see in Writer. I don't have Word 2003/2010 available to check, but is it possible that Microsoft made a breaking change? If so, do we have a community policy about which versions of Word we should prioritize for cross-compatibility? (In reply to Justin L from comment #2) > Created attachment 201847 [details] > ooo19070-1.doc: the original document exhibits similar problems in multiple > ways > > There are multiple related issues that can be seen in the original document: > 1.) MS Word 2003/2010 opens this with "Balance SBCS characters and DBCS > characters" disabled, but we import it as enabled. (Round-tripping the > document with MS Word "fixes" that problem.) There is a bug here. There's something going wrong with parsing this compatibility flag; the document shouldn't open in LO with the flag set.
Created attachment 201909 [details] ooo19070-1_minimal_word2019.pdf: oh good - MSO 2019 is same as 2003 (In reply to Jonathan Clark from comment #7) > When I open this file in new versions of > Microsoft Word, I see the same overflow we now see in Writer. That is from "Word as a web page" right? That is never reliable... All of the mso-test results are coming from Word 2019. I use Word 2010 as a confirmation while I am bibisecting. And since I'm replying anyway, I'll also mention a few other examples I've since run across lately: -NN27a.doc: attachment 97866 [details] Details from Bug 77314 [5 pages instead of 4 (because footer is taller now)] -С днем рождения.doc: attachment 83362 [details] Details from Bug 67582 [Page 8's "dog and presents" greeting] -2 Praktinis darbas.doc: attachment 128028 [details] from Bug 103254 [not a clear example, but 6 pages instead of 5]
Created attachment 201929 [details] forum-en-9318.doc: interesting example where spacing is SMALLER than it should be: Courier New
It's strange. In all of these documents, Copts60 is null - so fDntBlnSbDbWid should be unset. LO should be parsing Copts60 correctly according to the MS-DOC reference, but when Word reads these files it treats them like fDntBlnSbDbWid is set. (In reply to Justin L from comment #8) > Created attachment 201909 [details] > ooo19070-1_minimal_word2019.pdf: oh good - MSO 2019 is same as 2003 > > (In reply to Jonathan Clark from comment #7) > > When I open this file in new versions of > > Microsoft Word, I see the same overflow we now see in Writer. > That is from "Word as a web page" right? That is never reliable... > All of the mso-test results are coming from Word 2019. I use Word 2010 as a > confirmation while I am bibisecting. My screenshot was from the desktop version: Microsoft® Word for Microsoft 365 MSO (Version 2506 Build 16.0.18925.20076) 64-bit
*** Bug 167554 has been marked as a duplicate of this bug. ***
I think what I'm going to do is commit a change to temporarily disable handling this flag in DOC files, and reopen bug 88908 with a pointer to this comment. The code to parse the fDntBlnSbDbWid compat flag is correct. However, Word has some other mechanism to control whether or not that flag's value is actually applied to a document. I have no idea what it is. I looked at all of the likely Dop fields across a bunch of different documents, but I couldn't see any patterns. The only thing I can think of is maybe branching off the Dop structure version/size, but I have low confidence for that. I'd need better evidence before trying something so hacky. I'm also somewhat concerned that this isn't specific to fDntBlnSbDbWid. This bug could be a hint at a more general Copts masking feature that we don't implement. If so, it's possible we're flipping other compat flags on documents that we shouldn't and just haven't noticed yet.
(In reply to Jonathan Clark from comment #12) > I think what I'm going to do is commit a change to temporarily disable > handling this flag in DOC files, Sounds good to me. > I'm also somewhat concerned that this isn't specific to fDntBlnSbDbWid. This > bug could be a hint at a more general Copts masking feature that we don't > implement. If so, it's possible we're flipping other compat flags on > documents that we shouldn't and just haven't noticed yet. Yes that was my concern as well. I also was looking at the documentation trying to see if there was a masking feature, but didn't see anything...
Jonathan Clark committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/510cdadd6199c19406a021e1fb0cc29ce21b5e29 tdf#167552 sw: Disable DOC handling of fDntBlnSbDbWid compat flag It will be available in 26.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
The doc regression shouldn't happen anymore, so I'm marking this bug fixed. We can use bug 88908 to track reimplementing fDntBlnSbDbWid once this mystery mechanism is known.