Bug 119800 - FILEOPEN Table in frame in DOCX not displayed in Writer
Summary: FILEOPEN Table in frame in DOCX not displayed in Writer
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: All All
: medium normal
Assignee: László Németh
URL:
Whiteboard: target:7.0.0
Keywords: difficultyInteresting, easyHack, filter:docx, skillCpp, topicDebug
Depends on:
Blocks: DOCX-Tables DOCX-Frames
  Show dependency treegraph
 
Reported: 2018-09-11 10:00 UTC by Aron Budea
Modified: 2020-03-17 14:39 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample DOCX (30.60 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-09-11 10:00 UTC, Aron Budea
Details
Sample DOCX with "hidden text" removed (30.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-09-11 10:02 UTC, Aron Budea
Details
Screenshot in Word (28.67 KB, image/png)
2018-09-11 10:05 UTC, Aron Budea
Details
PDF exported in Word (6.34 KB, application/pdf)
2018-09-11 10:07 UTC, Aron Budea
Details
A minimal "vanish" document (1.26 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-11-27 09:40 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2018-09-11 10:00:41 UTC
Created attachment 144783 [details]
Sample DOCX

The tables in the attached DOCX went between Word and Writer back and forth, that's why they are both in frames.
In Writer the first table isn't shown.

The likely cause appears to be the following piece of XML in word\document.xml:

<w:pPr>
  <w:rPr>
    <w:vanish/>
  </w:rPr>
</w:pPr>

If the supposedly invisble content is removed from the document via Inspect Document in Word, both tables appear in Writer (positioning is slightly off, though).

Observed using LO 6.2 daily build (2018-09-07_23:40:38, 9a9b81c7212fa6a6762246593acf3f1950677a22) & 4.0.0.3 / Windows 7.
In 3.5.0.3 that one showing table appears worse.
Comment 1 Aron Budea 2018-09-11 10:02:49 UTC
Created attachment 144784 [details]
Sample DOCX with "hidden text" removed

There's no actual hidden text in the original, just the mentioned element.
Comment 2 Aron Budea 2018-09-11 10:05:55 UTC
Created attachment 144785 [details]
Screenshot in Word
Comment 3 Aron Budea 2018-09-11 10:07:19 UTC
Created attachment 144786 [details]
PDF exported in Word

Interestingly, frame placement shows the same difference in the exported PDF as the good DOCX displays in Writer.
Comment 4 Aron Budea 2018-09-11 11:07:20 UTC
(In reply to Aron Budea from comment #0)
> If the supposedly invisble content is removed from the document via Inspect
> Document in Word, both tables appear in Writer (positioning is slightly off,
> though).

(In reply to Aron Budea from comment #3)
> Interestingly, frame placement shows the same difference in the exported PDF
> as the good DOCX displays in Writer.
Most likely if the supposedly empty paragraph is removed, that adjusts position of the second frame/table (in Word as well).
Comment 5 Buovjaga 2018-10-05 10:30:18 UTC
Confirmed.

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 36befb3aca96907a14e71e82497dbb8f03ead5ab
CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: gtk3_kde5; 
Locale: fi-FI (fi_FI.UTF-8); Calc: threaded
Built on 3 October 2018
Comment 6 Mike Kaganski 2018-11-27 08:47:00 UTC
And the table is actually still there, just hidden: enabling dispaly of Hidden text under Options→Writer→Formatting Aids shows that.
Comment 7 Mike Kaganski 2018-11-27 09:11:44 UTC
The problem here seems to be that <w:rPr> under <w:pPr> is the properties of *paragraph mark* only (ECMA-376 Part 1 sect. 17.3.1.29), not properties of all runs of the paragraph.
Comment 8 Mike Kaganski 2018-11-27 09:40:22 UTC
Created attachment 147060 [details]
A minimal "vanish" document

In the attachment, there are 3 paragraphs ("Para1", "Para2", and "Para3"), of which Para2 has the w:vanish under w:pPr/w:rPr. The document shows that LibreOffice does not always hide everything in the paragraph with this setup.

Word just joins the two paragraphs (Para2 and Para3) together, as if the paragraph mark didn't exist (visible when Word's "Show/Hide ¶" is not shown, and on print preview). LibreOffice does not ignore the paragraph mark (absent functionality?), and does not hide the plain paragraph's text - but seems to be hiding the anchored objects.
Comment 9 Mike Kaganski 2018-11-27 09:53:06 UTC
Code pointer: look for LN_EG_RPrBase_vanish in writerfilter/source/dmapper/DomainMapper.cxx
Comment 10 Xisco Faulí 2019-10-22 10:39:50 UTC
Still reproducible in

Version: 6.4.0.0.alpha1+
Build ID: de4839e66d3d195315729b95cc144cdab96b6e74
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded
Comment 11 Xisco Faulí 2019-10-22 10:40:47 UTC
(In reply to Mike Kaganski from comment #9)
> Code pointer: look for LN_EG_RPrBase_vanish in
> writerfilter/source/dmapper/DomainMapper.cxx

Let's turn it into an easyhack then...
Comment 12 Justin L 2020-02-05 13:28:31 UTC
This probably is closer to "impossible" than easy.

The first thing is to ignore DOCX and make non-hidden anchored objects possible in LO itself (without breaking existing documents).
Comment 13 László Németh 2020-03-17 14:38:10 UTC
tdf#119800 DOCX import: fix vanished objects

Not hidden objects, for example shapes and tables were
converted to hidden text, when they were anchored to
empty hidden paragraphs (see w:vanish character property
in OOXML).

Note: now DOCX round-trip doesn't change the document
layout (previously DOCX export hid the vanished object),
but Writer shows also an extra empty paragraph with
the fixed object, so the layout is still not the same
here.

Follow-up of commit 2be656908e9f30d0b0f795cc67096f0d673a3a21
(tdf#128646 DOCX import: don't hide shape of hidden paragraph),
extending the fix also for not table paragraphs.
Comment 14 Commit Notification 2020-03-17 14:39:07 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ab25bf4b2c51e5634bdfeaa1f84af4bb652f7a47

tdf#119800 DOCX import: fix vanished objects

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.