Bug 100507 - RTF import: Doesn't recognize formatting, missing characters
Summary: RTF import: Doesn't recognize formatting, missing characters
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.3.2 release
Hardware: All All
: medium normal
Assignee: Miklos Vajna
URL:
Whiteboard: target:5.3.0 target:5.2.2
Keywords: bisected, filter:rtf, regression
Depends on:
Blocks:
 
Reported: 2016-06-20 20:45 UTC by Susan
Modified: 2016-08-17 07:50 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Direct export of FTM 2014 RTF prior to opening in Writer (28.89 KB, application/rtf)
2016-06-20 20:45 UTC, Susan
Details
FTM 2006 file after opening in Writer (20.78 KB, application/rtf)
2016-06-20 20:48 UTC, Susan
Details
PDF Created from FTM 2014 (529.54 KB, application/pdf)
2016-06-26 11:54 UTC, Susan
Details
Example of overwriting header (101.66 KB, application/pdf)
2016-06-26 12:30 UTC, Susan
Details
What I thought were headers.... (38.97 KB, application/rtf)
2016-06-26 12:53 UTC, Susan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Susan 2016-06-20 20:45:45 UTC
Created attachment 125781 [details]
Direct export of FTM 2014 RTF prior to opening in Writer

Opening an RTF exported by Family Tree Maker 2014-1, Writer ignores formatting (tabs, indents, soft returns) and deletes characters.  Yet when I open an RTF from FTM 2006, Writer works as advertised.  I've attached an RTF exported directly from Family Tree Maker 2014-1 prior to opening in Writer.

Am also reporting this to MacKiev for them to check their end.
Comment 1 Susan 2016-06-20 20:48:26 UTC
Created attachment 125782 [details]
FTM 2006 file after opening in Writer

This is what the 2014 file should look like.
Comment 2 Buovjaga 2016-06-26 10:47:00 UTC
It would be nice to see the intended look of the 2014 file as a PDF. Does it work OK in Microsoft Word? Do you have access to Word? If not, I can try it later.
After we confirm the actual problems, we have to split this report to multiple ones, because one report should only be about one issue.
Comment 3 Susan 2016-06-26 11:54:51 UTC
Created attachment 125919 [details]
PDF Created from FTM 2014

Here's a PDF file using FTM 2014.  I don't have access to Word, though.
Comment 4 Buovjaga 2016-06-26 12:11:26 UTC
In the summary you write "body overwrites header" - what does this mean?
Comment 5 Susan 2016-06-26 12:30:21 UTC
Created attachment 125920 [details]
Example of overwriting header

See 2nd page of attachment.
Comment 6 Susan 2016-06-26 12:53:49 UTC
Created attachment 125922 [details]
What I thought were headers....
Comment 7 Susan 2016-06-26 12:59:17 UTC
Okay, I thought the headers were screwing up, but looks like I might have been wrong.  If you open the What I thought were headers atch, and look at pgs 2 & 3 - page break is out of place.  I don't know if Libre entered it or if it's embedded in the exported file.  But, the bold lines are dividing lines between generations, yet they're out of place, too.  I apologize - I know I'm not explaining this very well.
Comment 8 Buovjaga 2016-08-05 15:03:21 UTC
Ok, the file actually looks like it should in LibreOffice 3.3 so this is a regression.

I confirmed the problem on Linux, so changing OS to All.

Bibisecters: look at the pdf in attachment 125919 [details] to see how it should be laid out.

LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4

Arch Linux 64-bit, KDE Plasma 5
Version: 5.3.0.0.alpha0+
Build ID: f3d26af51588af441f62fb69bb7a5432845226ac
CPU Threads: 8; OS Version: Linux 4.6; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on August 5th 2016
Comment 9 Buovjaga 2016-08-05 15:39:55 UTC
Just to confirm this is bibisectable: it works also in Version 3.6.7.2 (Build ID: e183d5b)
Comment 10 Miklos Vajna 2016-08-10 16:37:48 UTC
One difference I see between 3.6 and master is that the "Generation 1" paragraph has a large left margin on 3.6, and it's 0 on master.

I plan to look at that.
Comment 11 Miklos Vajna 2016-08-10 18:10:58 UTC
Bibisecting using bibisect-44max.git:
 868fae4fac0679d728ca6ec756a4c3b5fda43a77 is the first bad commit
commit 868fae4fac0679d728ca6ec756a4c3b5fda43a77
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Sat Mar 14 21:40:46 2015 +0800

    source-hash-1be0a3fa9ebb22b607c54b47739d4467acfed259
    
    commit 1be0a3fa9ebb22b607c54b47739d4467acfed259
    Author:     Michael Stahl <mstahl@redhat.com>
    AuthorDate: Tue Jun 17 18:40:04 2014 +0200
    Commit:     Michael Stahl <mstahl@redhat.com>
    CommitDate: Tue Jun 17 18:42:07 2014 +0200
    
        n#825305: writerfilter RTF import: override style properties like Word
    
        It would certainly be immediately obvious to any reader of the RTF spec
        that \sN will apply the style with index N to the current paragraph.
    
        But actually, that is not what Word does when it reads \sN...
        what it really does is to apply the style with index N, and then for
        every attribute in that style, apply the same attribute with a default
        value to the paragraph, effectively overriding what's in the style.
    
        If that doesn't make any sense to you, well, have you heard the joke
        about how many Microsoft engineers it takes to change a light bulb?
    
        Also, \pard apparently implies \s0.
    
        To implement that, change RTFSprms::deduplicate() to recursively
        look for style SPRMs that are missing in the properties, and put
        in default ones, currently just for 2 keywords \sa and \sb.
    
        This requires changing deduplicate() to be const and return a new value,
        since it is no longer idempotent, as the erased SPRMs would get
        defaulted on the next run.
    
        While at it, fix RTFValue::equals() which did not compare m_sValue.
    
        This fixes the testParaBottomMargin test that was broken by the fix
        for fdo#70578.
    
        Change-Id: I4ced38628d76f6c41b488d608a804883493ff00b

:040000 040000 28e3aedf8721ecacc6cb7d7166ee96448d71d9ca 2824ba57aa0ff76d64e426af2ae5e20bc5e150a5 M      opt


Adding Cc: to mstahl@redhat.com
Comment 12 Miklos Vajna 2016-08-10 22:09:54 UTC
Ah, I know what's going on: \*\cs0 in the document describes a character style with the id 0, which is quite unusual, as the 0th style is normally a paragraph one; and our assumption blows up when trying to use that style name as a paragraph style name later, resulting in loosing paragraph properties. I'll fix this.
Comment 13 Commit Notification 2016-08-16 08:43:47 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=2de168e99ba9cd2539f1ddbeffad7e3eb71a7b1b

tdf#100507 RTF import: don't set default para style to the 0th char style

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2016-08-17 07:50:53 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=faaec32b1b6f2a9f8fb0541a5355beddfec37432&h=libreoffice-5-2

tdf#100507 RTF import: don't set default para style to the 0th char style

It will be available in 5.2.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.