Bug 165556 - overlapping redlines: threat or menace?
Summary: overlapping redlines: threat or menace?
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-03 18:23 UTC by Michael Stahl (allotropia)
Modified: 2025-04-15 16:24 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Stahl (allotropia) 2025-03-03 18:23:09 UTC
a question i'm pondering is if something should be done to prevent the overlapping redlines from occurring; these can be created by writerfilter.

sw/source/core/doc/DocumentRedlineManager.cxx:112: redline table corrupted: overlapping redlines

DocumentRedlineManager::AppendRedline() already ensures that Delete, Insert, Format redlines don't overlap, these are all in maRedlineTable (Format may have the same start/end as an Insert/Delete, but that can be handled easily).

then there are various Table* redlines but those are in a separate maExtraRedlineTable.

additionally there are FmtColl and ParagraphFormat in maRedlineTable, those are the ones that cause overlaps.

* FmtColl is always a point, but it could be contained inside Insert/Delete.

* ParagraphFormat is created as a PaM from start to end of a SwTextNode.

it looks like the main reason for it being a pam instead of a point is that the SwRedlineItr can then apply the special redline format to the whole paragraph ... but actually that's something i just implemented in the fix for bug 165322 :)

... not sure if there's another reason? in Word it is only on the paragraph end marker, not the whole paragraph, but Word will highlight the whole paragraph if you click on it...

the FmtColl and ParagraphFormat could both be points and moved to a separate vector maParagraphRedlineTable, eliminating all overlaps - then mainly SwRedlineItr needs to search this and the GetContentAtPos() function ... but it will also require code changes in unknown other places where currently maRedlineTable is accessed.

the problem with overlapping redlines is that it prevents the binary search in GetRedlinePos() and GetRedline() from finding the right place quickly; there is a fall-back to linear scan in those functions, based on a flag if an overlapping redline has ever been inserted.

now i have never looked at what Word/OOXML actually do with redlines, maybe there's some other stuff there that isn't yet implemented but would require overlapping redlines anyway?
Comment 1 Miklos Vajna 2025-03-04 07:58:36 UTC
My understanding is that there are cases where you have to either split redlines or have overlapping redlines and Writer decided to split redlines, so we shouldn't ever have overlapping redlines, ideally.

An example:

1. Alice types into the document, XY
2. Bob deletes Y

This could result in a document model with 2 redlines, the insert covering "XY" and the delete covering Y, so they would overlap. But we have code in Writer to split the XY insert into 2 redlines and then the delete redline will have a second SwRedlineData that will contain the "parent" insertion.

Now in Word, what you have is markup like this (DOCX):

      <w:ins w:id="0" w:author="M V" w:date="2025-03-04T08:54:00Z" w16du:dateUtc="2025-03-04T07:54:00Z">
        <w:r>
          <w:t>X</w:t>
        </w:r>
        <w:del w:id="1" w:author="Miklos Vajna" w:date="2025-03-04T08:55:00Z" w16du:dateUtc="2025-03-04T07:55:00Z">
          <w:r w:rsidDel="003A54BE">
            <w:delText>Y</w:delText>
          </w:r>
        </w:del>
      </w:ins>

So Word does have overlapping redlines, but they need to form proper XML nesting, so "<ins><del></ins></del>" style overlapping is not supported there, either.

I would suggest that given 1) we already have code to split redlines and 2) Word can't do arbitrary overlapping to completely avoid splitting, we keep doing this the split way and ideally never have overlapping redlines in the doc model.
Comment 2 László Németh 2025-03-04 11:18:33 UTC
Michael (and Attila, see later), first of all many thanks for your improvements! (Also thanks to Noel for introducing the redline binary search and several other optimizations!)

Indeed, the overlapping redlines were introduced by the writerfilter, likely as a trade-off between the available resources and the interoperability requirements, so there is no reason to keep this workaround, if there is a better solution:

void DomainMapper_Impl::CheckRedline( uno::Reference< text::XTextRange > const& xRange )
 {
    // Writer core "officially" does not like overlapping redlines, and its UNO interface is stupid enough
    // to not prevent that. However, in practice in fact everything appears to work fine (except for the debug warnings
    // about redline table corruption, which may possibly be harmless in reality). So leave this as it is, since this
    // is a better representation of how the changes happened. If this will ever become a problem, overlapping redlines
    // will need to be merged into one, just like doing the changes in the UI does, which will lose some information
    // (and so if that happens, it may be better to fix Writer).

Handling format changes, and clearly separating them on the UI, like MSO does, may need overlapping redlines, at least in some restricted form: only a format redline can overlap a non-format redline. Handling format and text changes must be fully separated and not interfered, to reduce time and effort during managing changes, see the associated UX of Word (some information in https://support.microsoft.com/en-us/office/track-changes-in-word-197ba630-0f5f-4a8e-9a77-3712475e806a).
 
It seems, non-overlapping, also overlapping redlines were not enough for MSO/DOCX, see moveFromRangeStart/End bookmarks for text moving, also table column/row moving (special changes, and it seems, it's possible to disable their recording in MSO: https://wordribbon.tips.net/T008834_Accepting_All_Formatting_Changes.html):

https://c-rex.net/samples/ooxml/e1/part4/OOXML_P4_DOCX_moveFromRangeStart_topic_ID0EOYGW.html#topic_ID0EOYGW

Moving sentences, and changing them has got a highly complicated user interface in Word, restoring full history of the changes, allowing accept/reject in all the phases. Adding loext:move-in by Attila, I like the recent user interface of Writer better: accepting the deletion part of the text movement accepts the insertion part(s) immediately. A newly introduced UX problem here, that accepting the insertion is not visible well. Also there is a much bigger UX problem: extra deletions inside the insertion part aren't tracked after the text moving (because the author and the time frame are the same). This is a huge and dangerous regression: the proofreader accepts other deletions without noticing them:

Text: Lorem ipsum dolor sit amet

Moving first two words + a space to the end:

[Lorem ipsum ]dolor sit amet {Lorem ipsum }

Inserting a space, removing "ip" and the terminating space:

[Lorem ipsum ]dolor sit amet{ }{Lorem *sum*}

Accepting the first deletion: dolor sit amet{ }Lorem sum

So we removed "ip" (and the terminating space) silently, keeping only the space insertion.

It's a good idea to fix the previous problem by always forcing the recording of the extra deletion from the same author in the same time frame during text moving.

Also maybe a good idea to solve the known overlapping/format change problem in a similar way, like Attila did with the moved text (loext:format-in?), allowing to accept the same formatting change by a single click in Manage Changes: now a single character format change is split by every text change to three different redlines, which means three clicks in Manage Changes to accept all parts of the single format change. Also one of the clicks accepts the text change with the format change, and I'm not sure that correctly: it's not possible to separate them, as requested.

Also Undo of the (format) change is still an interoperability problem, but maybe that doesn't affect overlapping.  It would be possible to store the format change as the deletion and insertion of the same text, if needed, but the UX would be still a drawback. But maybe it's still better for interoperability, than creating DOCX from ODF for the victims of direct character formatting.