Bug 167552 - balanceSingleByteDoubleByteWidth affecting spaces that it shouldn't
Summary: balanceSingleByteDoubleByteWidth affecting spaces that it shouldn't
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
25.8.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:26.2.0
Keywords: bisected, regression
: 167554 (view as bug list)
Depends on:
Blocks: DOC
  Show dependency treegraph
 
Reported: 2025-07-17 21:41 UTC by Justin L
Modified: 2025-07-25 20:48 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
ooo19070-1_minimal.doc: red arrow indicator that not all text is shown (24.50 KB, application/msword)
2025-07-17 21:41 UTC, Justin L
Details
ooo19070-1.doc: the original document exhibits similar problems in multiple ways (5.46 MB, application/msword)
2025-07-17 22:54 UTC, Justin L
Details
ooo19070-1.doc_prev-import-1.png: basically perfect before comment 0's commit. RED=MS Word (157.06 KB, image/png)
2025-07-17 22:57 UTC, Justin L
Details
ooo19070-1.doc_import-1.png: after comment 0's commit. RED=Word2019 (172.20 KB, image/png)
2025-07-17 22:57 UTC, Justin L
Details
forum-mso-en-6216.doc: page 1 is an example (108.00 KB, application/msword)
2025-07-18 16:07 UTC, Justin L
Details
Screenshot of ooo19070-1_minimal in Word 360 (44.68 KB, image/png)
2025-07-19 02:39 UTC, Jonathan Clark
Details
ooo19070-1_minimal_word2019.pdf: oh good - MSO 2019 is same as 2003 (18.29 KB, application/pdf)
2025-07-19 14:34 UTC, Justin L
Details
forum-en-9318.doc: interesting example where spacing is SMALLER than it should be: Courier New (21.00 KB, application/msword)
2025-07-21 19:45 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Justin L 2025-07-17 21:41:10 UTC
Created attachment 201846 [details]
ooo19070-1_minimal.doc: red arrow indicator that not all text is shown

Layout has increased the amount of space assigned to sequential spare characters, resulting in a layout that no longer matches what MS Word produces. In this minimized example, which has "Balance SBCS characters and DBCS characters" turned on, the table is now showing a red arrow indicating non-visible context exists. The date should show "August, 2003", but only "August, " is visible.

This started with commit 6818bc55ff248c59f12b2e090139eff30fe949dd
Author: Jonathan Clark on Wed Mar 26 14:28:55 2025 -0600
    tdf#88908 sw: Add BalanceSpacesAndIdeographicSpaces compat option
    Reviewed-on: https://gerrit.libreoffice.org/c/core/+/183412

Found by Collabora's mso-test.
Comment 1 Jonathan Clark 2025-07-17 21:50:37 UTC
Confirmed.
Comment 2 Justin L 2025-07-17 22:54:50 UTC
Created attachment 201847 [details]
ooo19070-1.doc: the original document exhibits similar problems in multiple ways

There are multiple related issues that can be seen in the original document:
1.) MS Word 2003/2010 opens this with "Balance SBCS characters and DBCS characters" disabled, but we import it as enabled. (Round-tripping the document with MS Word "fixes" that problem.)


2.) [Now with MS Word having round-tripped the document after turning on "Balance SBCS characters and DBCS characters"...]
In LO, the first page's content still has one line spill over to the next (otherwise empty) page. Note that prior to comment 0's identified patch, the paragraphs were overlaying each other almost perfectly.
Comment 3 Justin L 2025-07-17 22:57:06 UTC
Created attachment 201848 [details]
ooo19070-1.doc_prev-import-1.png: basically perfect before comment 0's commit. RED=MS Word
Comment 4 Justin L 2025-07-17 22:57:57 UTC
Created attachment 201849 [details]
ooo19070-1.doc_import-1.png: after comment 0's commit. RED=Word2019
Comment 5 Justin L 2025-07-18 16:07:35 UTC
Created attachment 201876 [details]
forum-mso-en-6216.doc: page 1 is an example

I'm find lots of documents affected by this commit. This one is legitimately a "Balance SBCS...". The first page should be full of text (including a footnote).

bug 114629's attachment 138570 [details] is another (poor, but balanced) example (top of page 4).
Comment 6 Jonathan Clark 2025-07-19 02:39:10 UTC
Created attachment 201895 [details]
Screenshot of ooo19070-1_minimal in Word 360

Screenshot of the minimal sample (ooo19070-1_minimal.doc) from the newest version of Microsoft Word, at time of writing. Note that the text overflows. In LO it overflows nicely with an indicator showing there is more text, but in Word it overflows to a second line that is clipped by the border.
Comment 7 Jonathan Clark 2025-07-19 02:47:11 UTC
I think there are two separate issues to discuss:

(In reply to Justin L from comment #0)
> Created attachment 201846 [details]
> ooo19070-1_minimal.doc: red arrow indicator that not all text is shown
> 
> Layout has increased the amount of space assigned to sequential spare
> characters, resulting in a layout that no longer matches what MS Word
> produces.

See attachment 201895 [details]. When I open this file in new versions of Microsoft Word, I see the same overflow we now see in Writer. I don't have Word 2003/2010 available to check, but is it possible that Microsoft made a breaking change? If so, do we have a community policy about which versions of Word we should prioritize for cross-compatibility?

(In reply to Justin L from comment #2)
> Created attachment 201847 [details]
> ooo19070-1.doc: the original document exhibits similar problems in multiple
> ways
> 
> There are multiple related issues that can be seen in the original document:
> 1.) MS Word 2003/2010 opens this with "Balance SBCS characters and DBCS
> characters" disabled, but we import it as enabled. (Round-tripping the
> document with MS Word "fixes" that problem.)

There is a bug here. There's something going wrong with parsing this compatibility flag; the document shouldn't open in LO with the flag set.
Comment 8 Justin L 2025-07-19 14:34:39 UTC
Created attachment 201909 [details]
ooo19070-1_minimal_word2019.pdf: oh good - MSO 2019 is same as 2003

(In reply to Jonathan Clark from comment #7)
> When I open this file in new versions of
> Microsoft Word, I see the same overflow we now see in Writer. 
That is from "Word as a web page" right? That is never reliable...
All of the mso-test results are coming from Word 2019. I use Word 2010 as a confirmation while I am bibisecting.

And since I'm replying anyway, I'll also mention a few other examples I've since run across lately:
-NN27a.doc: attachment 97866 [details] Details from Bug 77314 [5 pages instead of 4 (because footer is taller now)]

-С днем рождения.doc: attachment 83362 [details] Details from Bug 67582 [Page 8's "dog and presents" greeting]

-2 Praktinis darbas.doc: attachment 128028 [details] from Bug 103254 [not a clear example, but 6 pages instead of 5]
Comment 9 Justin L 2025-07-21 19:45:20 UTC
Created attachment 201929 [details]
forum-en-9318.doc: interesting example where spacing is SMALLER than it should be: Courier New
Comment 10 Jonathan Clark 2025-07-22 16:52:16 UTC
It's strange. In all of these documents, Copts60 is null - so fDntBlnSbDbWid should be unset. LO should be parsing Copts60 correctly according to the MS-DOC reference, but when Word reads these files it treats them like fDntBlnSbDbWid is set.

(In reply to Justin L from comment #8)
> Created attachment 201909 [details]
> ooo19070-1_minimal_word2019.pdf: oh good - MSO 2019 is same as 2003
> 
> (In reply to Jonathan Clark from comment #7)
> > When I open this file in new versions of
> > Microsoft Word, I see the same overflow we now see in Writer. 
> That is from "Word as a web page" right? That is never reliable...
> All of the mso-test results are coming from Word 2019. I use Word 2010 as a
> confirmation while I am bibisecting.

My screenshot was from the desktop version: Microsoft® Word for Microsoft 365 MSO (Version 2506 Build 16.0.18925.20076) 64-bit
Comment 11 Jonathan Clark 2025-07-23 17:43:54 UTC
*** Bug 167554 has been marked as a duplicate of this bug. ***
Comment 12 Jonathan Clark 2025-07-23 17:55:51 UTC
I think what I'm going to do is commit a change to temporarily disable handling this flag in DOC files, and reopen bug 88908 with a pointer to this comment.

The code to parse the fDntBlnSbDbWid compat flag is correct. However, Word has some other mechanism to control whether or not that flag's value is actually applied to a document. I have no idea what it is. I looked at all of the likely Dop fields across a bunch of different documents, but I couldn't see any patterns. The only thing I can think of is maybe branching off the Dop structure version/size, but I have low confidence for that. I'd need better evidence before trying something so hacky.

I'm also somewhat concerned that this isn't specific to fDntBlnSbDbWid. This bug could be a hint at a more general Copts masking feature that we don't implement. If so, it's possible we're flipping other compat flags on documents that we shouldn't and just haven't noticed yet.
Comment 13 Justin L 2025-07-23 18:06:53 UTC
(In reply to Jonathan Clark from comment #12)
> I think what I'm going to do is commit a change to temporarily disable
> handling this flag in DOC files,
Sounds good to me.

> I'm also somewhat concerned that this isn't specific to fDntBlnSbDbWid. This
> bug could be a hint at a more general Copts masking feature that we don't
> implement. If so, it's possible we're flipping other compat flags on
> documents that we shouldn't and just haven't noticed yet.
Yes that was my concern as well. I also was looking at the documentation trying to see if there was a masking feature, but didn't see anything...
Comment 14 Commit Notification 2025-07-23 20:52:36 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/510cdadd6199c19406a021e1fb0cc29ce21b5e29

tdf#167552 sw: Disable DOC handling of fDntBlnSbDbWid compat flag

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Jonathan Clark 2025-07-23 21:11:02 UTC
The doc regression shouldn't happen anymore, so I'm marking this bug fixed.

We can use bug 88908 to track reimplementing fDntBlnSbDbWid once this mystery mechanism is known.