Bug 101948 - Search fails to find specified indentation in imported Word document
Summary: Search fails to find specified indentation in imported Word document
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4.0.0.alpha0+ Master
Hardware: x86 (IA32) All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard: reviewed:2022
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2016-09-06 20:19 UTC by David F Smith
Modified: 2023-10-23 09:47 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Word document with various indented lines; described in text. (12.55 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-09-06 20:19 UTC, David F Smith
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David F Smith 2016-09-06 20:19:54 UTC
Created attachment 127181 [details]
Word document with various indented lines; described in text.

In a .docx document stored by Word and imported into Writer, I tried to search for lines with a specified indentation (0.5", for instance).  The lines were not found, even though the indentation format of the lines matched the search criteria.  Changing the indent within Writer and then changing it back, or saving the file as rtf and then opening that, makes the search work correctly.

The attached document, Indent_tests.docx, was stored by Word Starter v14.0.7172.5000.  It contains seven lines, each indicating its indentation.  To duplicate the problem, open the document in Writer.
1. Place the cursor in the second line, and choose Paragraph from the right-button menu or the Format menu.  Verify that the line has a Before-text indent of 0.50".  Click OK to close the Paragraph box.
2. Place the cursor in the first line.  From the Edit menu, select Find & Replace, or press Ctrl+H.
3. In the Search For box, enter "This" (without the quotes).  Click the plus sign beside Other options, then click Format.  In the Before text box selector, enter 0.50", or use the up-arrow to reach that number.  Click OK to close the Format box, then verify that "Indent left 0,5 inch" appears below the Search For box.
4. Click Find Next.  The notation "Search key not found" appears.
5. Try the other indentations in the Format box: Before text of 1.00" or First line of 0.50".  In every case, the search key is not found.
6. Click Close in the Find & Replace box.
7. Using the indent pointers in the ruler, or the Paragraph Format box, change the indent for one of the before-text lines, then change it back.  Verify with the Paragraph Format box that the indent is back to what it was.
8. Go back into Search & Replace and specify the format of the line that you adjusted.  That line will be found, but the other one with the same indent will not.  Close the Find & Replace box.
9. Save the document as Rich Text (*.rtf).  Close the document, then open the .rtf file.  Repeat the tests in steps 1-5.  All of the searches will succeed.


It appears that in an imported Word document, the Search cannot find a specified indent, but once that indent has been set within Writer, or once the file has been converted to rtf by Writer, the Search is fine.  Interestingly, storing an rtf file from Word doesn't fix the problem.
Comment 1 Buovjaga 2016-09-29 19:17:39 UTC
Repro.

It works in 3.6, so regression.

Arch Linux 64-bit, KDE Plasma 5
Version: 5.3.0.0.alpha0+
Build ID: 7cf444454c0c27e2f6d764164ea880b87163f45a
CPU Threads: 8; OS Version: Linux 4.7; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on September 27th 2016

Arch Linux 64-bit
Version 3.6.7.2 (Build ID: e183d5b)
Comment 2 raal 2016-09-30 20:58:48 UTC
Reproducible with Version: 4.4.0.0.alpha0+, win7
Comment 3 raal 2017-04-04 15:32:53 UTC Comment hidden (obsolete)
Comment 4 Mike Kaganski 2017-11-29 16:06:10 UTC
See bug 103423 comment 8 (the bibisected commit might actually be not the real cause).
Comment 5 QA Administrators 2018-11-30 03:58:52 UTC Comment hidden (obsolete)
Comment 6 David F Smith 2018-11-30 16:24:47 UTC
This bug is still present, with exactly the same symptoms as in the original report.

Version: 6.0.5.2 (x64)
Build ID: 54c8cbb85f300ac59db32fe8a675ff7683cd5a16
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: en-US (en_US); Calc: group
Comment 7 QA Administrators 2019-12-01 03:40:25 UTC Comment hidden (obsolete)
Comment 8 David F Smith 2019-12-05 18:10:58 UTC
This bug is still present, with exactly the same symptoms as in the original report.

Version: 6.3.3.2 (x64)
Build ID: a64200df03143b798afd1ec74a12ab50359878ed
CPU threads: 4; OS: Windows 10.0; UI render: default; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 9 David F Smith 2019-12-05 18:21:44 UTC
I should have mentioned in comment 8 that the user interface in the Find & Replace box has changed in minor ways, so some of the instructions aren't exactly correct (e.g., the field is now called "Find" instead of "Search For").  But functionally, the behavior is the same.
Comment 10 Mike Kaganski 2019-12-05 20:21:56 UTC
The search fails in lcl_Search in sw/source/core/crsr/findattr.cxx, where it calls CmpAttr, and there uses SvxLRSpaceItem::operator==. The difference between the paragraph's SvxLRSpaceItem and the one from search descriptor is in bExplicitZeroMarginValRight, which is true in the descriptor's, but false in paragraph's. It seems that the DOCX doesn't call SvxLRSpaceItem::SetRight for zero, unlike e.g. RTF filter.

The obvious "fix" would be to make OOXML filter call SvxLRSpaceItem::SetRight in these cases. But that wouldn't be the correct fix, because it is obvious that the comparison failed on a value that wasn't meant to be searched: it's left indent that was important to user, but right indent that fails comparison. Since both indents are kept in the same item, the only way to fix that that I see is to use a special class derived from SvxLRSpaceItem for search, which would know which settings are set, and use an overridden operator== that would take that into account.

Possibly could be made an easy hack with difficulty=interesting.
Comment 11 Hossein 2022-10-19 16:05:57 UTC
Still reproducible with the latest LO 7.5 dev master:

Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: ef9d461e420ca1869f88fa0d7ea749581819b360
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 12 Hossein 2022-10-19 16:09:05 UTC
Re-evaluating the EasyHack in 2022

This issue is still relevant. The EasyHacker should follow the solution and code pointer provided by Mike in comment 10. One should have a good understanding of C++ and be familiar with OOXML filter before trying to tackle this problem.
Comment 13 Khushi Gautam 2023-10-20 15:41:54 UTC
I would like to work on this
Comment 14 Buovjaga 2023-10-23 09:35:03 UTC
This was fixed in 7.6 by db115bec9254417ef7a3faf687478fe5424ab378
Comment 15 Mike Kaganski 2023-10-23 09:47:25 UTC
(In reply to Buovjaga from comment #14)
> This was fixed in 7.6 by db115bec9254417ef7a3faf687478fe5424ab378

Note though, that there are other properties of this kind; when we use a complex item to represent several user-visible properties at once, they would show the same behavior.

Michael's fix is the proper way fixing it. Splitting complex items into simple ones would be the best way. However, it might be non-trivial task, as the fixing commit shows.

So the approach outlined in comment 10 might still be valid in similar cases.