Bug 52028 - automatic styles with RSIDs created by enhanced document compare feature destroy font kerning
Summary: automatic styles with RSIDs created by enhanced document compare feature dest...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.0.0.beta3
Hardware: Other All
: medium major
Assignee: Michael Stahl (CIB)
URL:
Whiteboard: target:4.2.0 target:4.1.0.2
Keywords: regression
Depends on:
Blocks:
 
Reported: 2012-07-12 18:55 UTC by Roman Eisele
Modified: 2017-06-14 10:12 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
ZIP archive with all test files from test A to C (66.15 KB, application/zip)
2012-07-12 18:55 UTC, Roman Eisele
Details
Screenshot comparison for test C, showing working and broken kerning (139.73 KB, image/png)
2012-07-12 19:30 UTC, Roman Eisele
Details
Compares XML (60.11 KB, application/pdf)
2012-07-20 11:28 UTC, Rainer Bielefeld Retired
Details
Test A, LibO 3.6.0b3 .odt file, viewed with BBEdit, problematic sections emphasized (438.99 KB, image/png)
2012-07-20 17:24 UTC, Roman Eisele
Details
Test A, LibO 3.5.3.3 .odt file, viewed with BBEdit, interesting (correct) sections emphasized (382.25 KB, image/png)
2012-07-20 17:25 UTC, Roman Eisele
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Eisele 2012-07-12 18:55:42 UTC
Created attachment 64151 [details]
ZIP archive with all test files from test A to C

It took me two days to find out what's going wrong with paragraph and character styles in LibreOffice 3.6 beta 3, so please take this serious ;-) I'm sorry I have to give a rather lengthy explanation, but the matter is not simple ...


== Results ==

When editing even the simplest .odt files, Writer now applies automatic styles everywhere even if this is completely unnecessary because the text formatting does not differ at all from the current paragraph style (and/or character style, if any). Therefore, if you keep editing a .odt file for a while, this causes the creation of countless automatic styles, most of which are completely empty (contain no formatting information). This leads to needless .odt file size and complexity.

Normally this problem is not "visible" at the UI level of Writer, you have to look at the .odt file contents with an editor which can show the contents of an .odt file directly (e.g., BBEdit) or to save the .odt file as .fodt file which every text editor can open. But there are circumstances under which you can see the result of this bug directly in LibreOffice (and in printing and in PDF export!):

* Font kerning does not work over the beginning and end of characters styles, therefore even the most necessary kerning pairs like A + V or A + T or T + - (which have a negative kerning value in most fonts to look better) will not get kerned anymore if you insert one of the characters subsequently (see test C below for further explanation).

* Ligatures don't work over the beginning and end of character styles, too, therefore, if you use a font which contains ligature information for pairs like f + i, they will not get applied if you insert one of these characters subsequently.

Why is this bug a bug? You may say that it just increases the complexity of .odt files, but is not "wrong" in a technical sense. Well, first, there are special circumstances under which this bug makes the documents actually look wrong (see the note about font kerning and ligatures above -- both get broken by this bug!). And second, this bug unnecessarily increases the complexity and size of .odt files, making the contents unnecessarily hard to read and parse, which is IMHO against the philosophy behind the ODF file format specification -- unlike Microsoft's strange Office 2007 XML format (.docx etc.), which may be intentionally complex to make parsing difficult for foreign software, the ODF file format was designed to be as simple as possible, to make it easy to write parsers and even to allow human beings to read the XML code directly. This is counteracted by this bug.


== Steps to reproduce ==

All the following tests have been made with LibreOffice 3.5.5.3 and LibreOffice 3.6.0 beta 3, both with German langpacks installed, both on MacOS X 10.6.8 German, and both using LibreOffice default settings (I delete/rename my user profile completely before doing every part of these tests!).

=== Test A -- creating a new document ===

1.1  Reset your LibreOffice user profile (rename it or delete it),
     to make sure that the results are not influenced by any custom settings.
1.2  Create a new empty Writer document with LibreOffice 3.5.5.3.
1.3  Show the "Styles and Formatting" window.
1.4  In the "Styles and Formatting", double-click on the paragraph style
     "Text body" to select it.
1.5  Click once in the (empty) document text area (this may be important!
     No kidding!).
1.6  Type "Hello world!".
1.7  Save the document in .odt format.
1.8  Save the document in .fodt format.

1.9  Open the .fodt file created with LibreOffice 3.5.5.3
     with a good text editor of you choice.
1.10 Search for "<office:automatic-styles>";
     you will find only one page layout entry
       <style:page-layout style:name="pm1">
         ...
       </style:page-layout>
     but no automatic paragraph or characters styles.
1.11 Scroll down to the end of the document;
     you will find that the text contents of the document are very simple:
       <text:p text:style-name="Text_20_body">Hello world!</text:p>
     and this is how it should be.

2.1- Repeat steps 1.1 to 1.8 with LibreOffice 3.6.0 beta 3,
2.8  creating new .odt and .fodt files.

2.9  Open the .fodt file created with LibreOffice 3.6.0 beta 3
     with a good text editor of you choice.
2.10 Search for "<office:automatic-styles>"; you will find one or two (*)
     completely needless automatically created styles, one or both of:
       <style:style style:name="P1" style:family="paragraph"
       style:parent-style-name="Text_20_body">
         <style:text-properties officeooo:paragraph-rsid="001ffdd7"/>
       </style:style>
       <style:style style:name="T1" style:family="text">
         <style:text-properties officeooo:rsid="001ffdd7"/>
       </style:style>
     The paragraph style ("P1") is a child of the "Text body" style, but it
     does not contain any additional style information, i.e. it does not
     at all differ from "Text body", and this is what I call an "empty" style.
     The same is true for the character style ("T1"): it is obvious that it
     does not contain any formatting information, it is empty.
2.11 Scroll down to the end of the document;
     you will find that the text contents of the document looks somehow like:
       <text:p text:style-name="P1">
         <text:span text:style-name="T1">Hello world!</text:span>
       </text:p>
     There are two errors here:
     (a) it is really needless to use "P1" instead of "Text body" directly;
     (b) the complete <text:span ...> </text:span> is nonsense,
     because it does not change the text formatting at all.

(*) Note: under circumstances which are not completely transparent to me (probably if you don't reset your user profile before doing these test, or if don't klick into the main document text area before you type "Hello world!", or if use the backspace while typing "Hello World", etc.), it is possible that in the LibreOffice 3.6 beta 3 generated file only one of the two automatic styles is created and applied, i.e. "P1" *or* "T1" instead of both. But even if there is just one of these automatic styles, this still is an error, because both automatic styles are completely needless -- just compare the file generated with LibreOffice 3.5.5.3!


=== Test B -- inserting and appending some text ===

3.1  Reset your LibreOffice user profile (rename it or delete it),
     to make sure that the results are not influenced by any custom settings.
3.2  Duplicate the .odt file created with LibreOffice 3.5.5.3
     in step 1.2 to 1.7 above.
3.3  Open it again with LibreOffice 3.5.5.3.
3.4  Click between "Hello" and "world!" and type something like "my dear".
3.5  Click after "world!" and type something else, e.g.
     "You are God’s creation."
3.6  Save the document.
3.7  Save the document in .fodt format.

3.8  Open the .fodt file created with LibreOffice 3.5.5.3
     with a good text editor of you choice.
3.9  Search for "<office:automatic-styles>";
     you will still find only one page layout entry
       <style:page-layout style:name="pm1">
         ...
       </style:page-layout>
     but still no automatic paragraph or characters styles.
3.10 Scroll down to the end of the document;
     you will find that the text contents of the document are still simple:
       <text:p text:style-name="Text_20_body">Hello my beloved world!
       You are God’s creation!</text:p>
     and this is how it should be.

4.1- Repeat steps 3.1 to 3.7 with LibreOffice 3.6.0 beta 3,
4.7  creating new .odt and .fodt files.

4.8  Open the .fodt file created with LibreOffice 3.6.0 beta 3
     with a good text editor of you choice.
4.9  Search for "<office:automatic-styles>"; you will find the same useless
     automatically created styles mentioned above in step 2.10,
     plus one or two (*) additional empty character styles, e.g.:
       <style:style style:name="P1" style:family="paragraph"
       style:parent-style-name="Text_20_body">
         <style:text-properties officeooo:paragraph-rsid="001ffdd7"/>
       </style:style>
       <style:style style:name="T1" style:family="text">
        <style:text-properties officeooo:rsid="00214ec2"/>
       </style:style>
       <style:style style:name="T2" style:family="text">
        <style:text-properties officeooo:rsid="002177ac"/>
       </style:style>
     For all these automatic styles is true what I said in step 2.10 above:
     They don't add any formatting, they are "empty", they are useless
     and even confusing.
4.10 Scroll down to the end of the document;
     you will find that the text contents of the document looks somehow like:
       <text:p text:style-name="P1">
         Hello <text:span text:style-name="T1">my dear </text:span>
         world! <text:span text:style-name="T2">You are God’s
         creation!</text:span>
       </text:p>
     or even worse (*):
       <text:p text:style-name="P1">
         <text:span text:style-name="T1">Hello
           <text:span text:style-name="T2">my dear </text:span>
           world! <text:span text:style-name="T3">You are God’s
           creation!</text:span>
         </text:span>
       </text:p>
=>   We see that every insertion or appending of text creates a new <span>
     and a corresponding new automatic character style.
     All these <span>s and automatic styles are a mess, because they are
     useless (don't add any formatting) and just make the simple document
     unnecessarily complex.

(*) Note: if you find one or two additional automatic character styles, and therefore two or three automatic character styles in total, depends again from some circumstances not completely clear to my, see my note after step 2.11 above.


=== Test C -- kerning is broken ===

5.1  Reset your LibreOffice user profile (rename it or delete it),
     to make sure that the results are not influenced by any custom settings.
5.2  Create a new empty Writer document with LibreOffice 3.5.5.3.
5.3  Type "ATAA".
5.4  Save the document in .odt format.
5.5  Close the document.
5.6  Open the document again with LibreOffice 3.5.5.3.
5.7  Click between the two "A" at the end and type "T".
5.8  Click after the last "A" and type "T".

=>   The document now reads "ATATAT". Depending on some LibreOffice defaults
     settings (is kerning enabled by default -- for me, it is) and on the
     fonts of your operating system you will see (you may need to zoom in
     before!) that there is some negative kerning between every "A" and
     every "T", making the text look equally spaced.

6.1- Repeat steps 5.1 to 5.8 with LibreOffice 3.6.0 beta 3,
6.8  creating a new .odt file.

=>  The document now reads "ATATAT", too. But if you zoom in, you will see
    that this text looks strange! There is kerning around the 1st "T",
    but neither between the 2nd "A" and the 2nd "T" nor between the 2nd
    "T" and the third "A", nor before the last "T".

    Why? Every time we insert or append some text, LibreOffice 3.6
    puts it into a new <span>...</span>, and there is no kerning possible
    over the margin of a <span>; i.e., "AT" get kerned, but "A<span>T"
    or "A</span>T" does not get kerned. You can proof this if you save
    the .odt file created in step 6.1 to 6.8 as .fodt file (with
    LibreOffice 3.6 beta 3, of course) and look at the .fodt file
    with a text editor; the text contents of the document read

    <text:p text:style-name="P1">ATA
      <text:span text:style-name="T1">T</text:span>
      A<text:span text:style-name="T1">T</text:span>
    </text:p>
    
    or similar, and this is why kerning is working only around
    the first "T", but nowhere else. In the file created with
    LibreOffice 3.5.5.3, the same section just reads
    
    <text:p text:style-name="Standard">ATATAT</text:p>
    
    and this is how it should be, making kerning possible.


=== Test D -- getting everything right. ===

To remove all the unnecessary automatic paragraph and character styles and all the needless <span>s created by LibreOffice 3.6 beta 3, just open any of the .odt files created by LibreOffice 3.6 beta 3 with LibreOffice 3.5.5.3, insert a space, delete it again, save the document -- and all the mess is gone, as you will see if you save the file as .fodt file and open the latter with a good text editor. Voilá!

This seems to indicate that there is some "tidy up" code in LibreOffice 3.5 which removes unnecessary automatic styles and <span>s, but just stopped working for some reason in LibreOffice 3.6 beta 3. We need to re-enable it.


Attached to this bug report you will find a ZIP file containing all sample documents from test A to C above.
Comment 1 Roman Eisele 2012-07-12 19:30:15 UTC
Created attachment 64152 [details]
Screenshot comparison for test C, showing working and broken kerning
Comment 2 Rainer Bielefeld Retired 2012-07-12 20:35:45 UTC
Some first test confirm kerning problems with RC1, but currently I have done much less research than Roman so that currently I can't say much to roots and consequences.
Comment 3 Miklos Vajna 2012-07-13 12:19:16 UTC
Hi Roman,

Did you try to bibisect this? (See http://sweetshark.livejournal.com/11319.html)

Thanks,

Miklos
Comment 4 Petr Mladek 2012-07-13 12:40:08 UTC
I add some more Writer developers who worked on the import/export filters.

This is indeed a strange regression.
Comment 5 Roman Eisele 2012-07-13 14:58:53 UTC
(In reply to comment #3)
> Did you try to bibisect this? (See
> http://sweetshark.livejournal.com/11319.html)

Hi Miklos,

a good idea, but I'm using MacOS X, and AFAIK bibisecting is possible on Linux only ...

-> So someone else who uses some Linux variant needs to bibisect this.
While doing so, he/she could confirm this issue (status is still 'unconfirmed').
We have quite some QA volunteers who use Linux, so who wants to do it? ;-)
Comment 6 Rainer Bielefeld Retired 2012-07-15 16:36:03 UTC
This is a lot of stuff.  As a first step I compared the flatxml structure of reporter's test kit, found additional styles. I will do an own test trying to reproduce the results for Test A later.
Comment 7 Roman Eisele 2012-07-18 18:38:12 UTC
I have tried myself if I can reproduce the problem on Windows. And indeed, both test A and B give me the same results as on MacOS X (tested with LibreOffice 3.6.0.1, Build-ID: 73f9fb6, German UI, on Windows XP Professional 5.1.2600, Service Pack 3).

Only test C fails on WinXP -- but not because LibreOffice would not insert useless <text:span ...>...</text:span> tags whenever I insert or append any text (this is true on Windows just like on MacOS X), but only because text is obviously rendered differently on Windows, so that kerning works even over a <text:span> or </text:span> tag. To state it clearly: on MacOS X, 'A' and 'T' in
  A<text:span ...>T
and
  A</text:span>T
don't get kerned, but on Windows, they get kerned (given the fact that the current font actually contains a kerning pair for 'A' + 'T', of course, and that "Pair kerning" is enabled in LibreOffice for that text). I wonder how kerning is handled in this situation on Linux ...

According to these results, I change the Platform picker from 'MacOS X' to 'All'. However, I know that I can not (and do not want to) confirm my bug report myself, so we still need someone else who can confirm my results ;-) And then we need someone who fixes this bug soon ...
Comment 8 Rainer Bielefeld Retired 2012-07-20 11:06:53 UTC
I tried to reproduce "A", but NOT reproducible with Server Installation of  "LibreOffice 3.6.0.2 rc  German UI/Locale [Build-ID:  815c576] on German WIN7 Home Premium (64bit)  

@Roman: If you still reproduce with 3.6.0.2, I would do some additional chin-ups for y confirmation. For that may be we should meat on skype with screensharing. May be there is a very little difference in our proceedings what causes the difference?!
Comment 9 Rainer Bielefeld Retired 2012-07-20 11:28:18 UTC
Created attachment 64427 [details]
Compares XML

When I open both resulting  .fodt, the one from 3.6.0 has 731 Words / 23025 Characters, 355 document 708 / 22129
Both documents have only 1 "office:automatic-styles" area.

Comparing xml of both documents shows some differences, but I can't see reporter's problem.
Comment 10 Rainer Bielefeld Retired 2012-07-20 12:20:51 UTC
My mistake, searched for wrong xml statments.

I can confirm "<style:style style:name="P1" ..." not existing in fodt from 3.5.5 and existing in fodt from 3.6.0.0beta3 and from Server Installation of  "LibreOffice 3.6.0.2 rc  German UI/Locale [Build-ID:  815c576] on German WIN7 Home Premium (64bit) 

Now, of course, I also find this in "Compares XML" PDF document (Page 8)
Comment 11 Rainer Bielefeld Retired 2012-07-20 12:24:23 UTC
@ Roman Eisele 
Did you already check whether (and if yes were) the problem appears in .odt?
Comment 12 Roman Eisele 2012-07-20 17:24:01 UTC
Created attachment 64456 [details]
Test A, LibO 3.6.0b3 .odt file, viewed with BBEdit, problematic sections emphasized

(In reply to comment #8)
> @Roman: If you still reproduce with 3.6.0.2 [...]

I have checked with 3.6.0.1 -- still reproducible; I will check with 3.6.0.2 later this evening and report results (no free time earlier, sorry).


(In reply to comment #11)
> Did you already check whether (and if yes were) the problem appears in .odt?

Yes, I did -- the problem also appears in the .odt files, exactly like in the .fodt files, the only difference is the location (due to the different file structure). It is in the "contents.xml" file inside of the .odt file/archive. To view "contents.xml", you can either un-zip the .odt file and then open the "contents.xml" file, or you can view the .odt file with some editor that can browse .odt files directly -- e.g., BBEdit (MacOS only, I don’t know about Windows editors, sorry).

In "contents.xml", please search for the section
   <office:automatic-styles>...</office:automatic-styles>
or, if it is empty, like in my 3.5.5.3 sample files, just for
   <office:automatic-styles/>
! Here are all the useless automatic styles.

Additionaly, please look out for the first section of type
   <text:p ...
inside of (i.e., after) section
   <office:body><office:text>
; here all these automatic styles get applied.

Attached you find a screenshot of my LibO 3.6.0 beta 2 sample .odt file opened with BBEdit; I hope the screenshot shows better where to find the problem than my lengthy explanation ;-)
Comment 13 Roman Eisele 2012-07-20 17:25:02 UTC
Created attachment 64457 [details]
Test A, LibO 3.5.3.3 .odt file, viewed with BBEdit, interesting (correct) sections emphasized
Comment 14 Roman Eisele 2012-07-20 17:28:30 UTC
(In reply to comment #12)
> It is in the "contents.xml" file inside of the .odt file/archive.

Sorry, typo -- the sub-file is named "content.xml", not "contents.xml", without the "s". The screenshots show it correctly.
Comment 15 Roman Eisele 2012-07-20 19:15:12 UTC
(In reply to comment #8)
> @Roman: If you still reproduce with 3.6.0.2 [...]

Still reproducible with LibreOffice 3.6.0.2 (Build ID: 815c576), German langpack installed, on MacOS X 10.6.8 German. Nothing has changed, my results are exactly the same as with LibreOffice 3.6 beta 3 (see original description and sample files). I have deleted my LibO user profile completely before doing the test, so no influence of corrupted settings etc. is possible.


(In reply to comment #10)
> I can confirm "<style:style style:name="P1" ..." not existing in fodt from
> 3.5.5 and existing in fodt from 3.6.0.0beta3 [...]
> Now, of course, I also find this in "Compares XML" PDF document (Page 8)

@Rainer:
Thank you very much for your persistent testing! I know the issue is complicated, so I am really happy about that you can reproduce it now, at least case A. I hope (or fear) that you also can reproduce case B ... which shows the real dimension of the problem: every time one inserts or appends any text, Writer will also insert an annyoing <text:span>...</text:span>.
Comment 16 Roman Eisele 2012-07-21 09:53:47 UTC
I can't bibisect this problem because I'm not on Linux. But I have tested various older master builds for MacOS X which I have still on my HD, and the issue is already in the oldest LOdev version I can find:

  LOdev 3.6.0alpha0+, Build ID: 11e9ba6; Date: 2012-05-22;
  Installation file:
  master~2012-05-22_05.53.03_LibO-Dev_3.6.0alpha0_MacOS_x86_install_en-US.dmg

The regression must be even older. So if someone could *please* bibisect this issue, he/she can probably start with this version/date and then go back.

*

No suprise: The issue is also in the current master branch. I can reproduce it with identical results in

  LOdev 3.7.0.0.alpha0+, Build ID: 042686d; Date: 2012-07-19;
  installation file:
  master~2012-07-19_10.04.11_LibO-Dev_3.7.0.0.alpha0_MacOS_x86_install_en-US.dmg
Comment 17 Michael Stahl (CIB) 2012-08-22 18:30:20 UTC
so now i've finally had some time to read the lengthy description here :)
and it's immediately obvious that the cause is:

 commit 062eaeffe7cb986255063bb9b0a5f3fb3fc8e34c

 sw: Improved document comparison based on RSIDs.

so i think this is actually a feature, the "rsid" attributes are added
to styles so the "document compare" feature gives better results.

to what extent are the added spans an actual problem?
while it is nice to illustrate the problem, i'd say no actual user writes
a bunch of AAAAs into a document, saves it, loads it again and
intersperses that with Ts.

you can configure this via Tools->Options->Writer->Comparison,
but it seems that only has an effect on the actual comparison,
i.e. unchecked "Use RSID" will still cause them to be generated,
they're just ignored when you compare documents.

only thing odd to me is that in the simplest case we get a paragraph
and inside that a span covering the entire content, both with the
same rsid value, but different attributes (one a paragraph-rsid);
i'm not familiar with how these things actually work to tell
whether that is necessary.
Comment 18 Roman Eisele 2012-08-24 14:55:19 UTC
(In reply to comment #17)
> so i think this is actually a feature, the "rsid" attributes are added
> to styles so the "document compare" feature gives better results.

@Michael Stahl:
Thank you very much for looking into this issue! So we now have a much better understanding of what's going on.

Nevertheless, I am still not sure if there isn't something wrong with this feature, or, to say it even more polite, something which needs to get improved about this feature. Therefore I ask you (and everybody else) not to close this bug yet. To tell exactly what IMHO is wrong about the state of affairs, I will think some days about it, experiment a bit with this feature, and will then give you an updated and clear and, if possible, not so lengthy description ;-)

For now, I remove this bug from the MAB list, and ask you for some patience. Thank you in advance.

*

If somebody is curious, two hints just to make clear why I think that the state of affairs is not perfect:

(In reply to comment #17)
> while it is nice to illustrate the problem, i'd say no actual user writes
> a bunch of AAAAs into a document, saves it, loads it again and
> intersperses that with Ts.

Right ;-) But the same problem also occurs in everyday editing situations, if you just insert or alter a single character somewhere. At least on MacOS X, additional spans disable character kerning, and therefore this feature will cause the text to appear visually inconsistent and with aesthetic defects.

-> For this I will provide some simple samples with real-life background.

> you can configure this via Tools->Options->Writer->Comparison,
> but it seems that only has an effect on the actual comparison,
> i.e. unchecked "Use RSID" will still cause them to be generated,
> they're just ignored when you compare documents.

Would it be possible to expand this option so that it completely disables the generation of RSID spans if unchecked? This is all what would be required to make everybody happy ...
Comment 19 George Nassar 2012-08-27 17:04:22 UTC
(In reply to comment #18)
> > you can configure this via Tools->Options->Writer->Comparison,
> > but it seems that only has an effect on the actual comparison,
> > i.e. unchecked "Use RSID" will still cause them to be generated,
> > they're just ignored when you compare documents.
> 
> Would it be possible to expand this option so that it completely disables the
> generation of RSID spans if unchecked? This is all what would be required to
> make everybody happy ...

Makes sense from the UI perspective, too -- that's what I'd think that option would use.

Perhaps a temporary workaround would be to be able to scrub RSID spans from a document.  Not my idea, really; I cribbed it from http://blogs.msdn.com/b/ericwhite/archive/2008/11/04/remove-rsid-attributes-and-elements-before-comparing-documents.aspx who does it for Open XML documents, figuring the idea may be applicable.

Come to think, I'm not sure the two are necessarily mutually exclusive options.
Comment 20 Urmas 2012-11-23 04:57:04 UTC
*** Bug 53245 has been marked as a duplicate of this bug. ***
Comment 21 László Németh 2013-05-22 12:59:40 UTC
This “empty” segmentation combined with a hidden Graphite layout problem raised annoying text changes/editing problems, see Bug 53245. Now it results only missing Graphite ligatures.

Note: there are other types of missing kerning, for example, at boundaries of different colors/background or the kerning before the hyphen marks in automatic hyphenation, but this is an ugly, underhand regression.
Comment 22 Michael Stahl (CIB) 2013-05-28 18:35:16 UTC
remove "bibisectrequest", comment #17 already contains the commit
Comment 23 Commit Notification 2013-06-19 22:44:51 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6db39dbd7378351f6476f6db25eb7110c9cfb291

fdo#52028: sw: let text formatting ignore RSID in automatic styles



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 24 Michael Stahl (CIB) 2013-06-19 22:56:43 UTC
i've taught Writer's text formatting to ignore the breaks that
were introduced by the RSID attributes, which should restore
font kerning to its previous state.

due to the complexity of the fix i don't want to backport it to 4.0 branch,
it definitely needs testing on master / 4.1 first to see if it
introduces new bugs related to text formatting.

if somebody can play around with hard formatting in a debug or dbgutil
build and hit one of the assert() please tell me how :)
Comment 25 Commit Notification 2013-06-19 23:04:37 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-4-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d526b76c1dc0b3de564bc083821d512d41d5ab06&h=libreoffice-4-1

fdo#52028: sw: let text formatting ignore RSID in automatic styles


It will be available in LibreOffice 4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 26 Khaled Hosny 2013-06-20 06:44:47 UTC
So that why inserting vowel marks in the middle of an Arabic word wrecks its position completely! I can not comment on the code, but at least now I can edit Arabic text without breaking mark positioning or contextual shaping which is awesome! But this seems to only work for newly edited text not existing, broken one, is this how it is supposed to work ar am I missing something?
Comment 27 Michael Stahl (CIB) 2013-06-20 10:34:07 UTC
Khaled, the call to MergePortions in SwTxtNode::MakeFrm
should fix up everything on file import (assuming that's
what you mean by "existing text").

i haven't got a clue about Arabic text, but at first glance
the RSID problem appears language agnostic; perhaps what
you are seeing is caused by something else (would be good
to know if what you are seeing worked in 3.5.x or older versions
and started to break in 3.6.0, in that case it could be caused
by RSIDs).

or you could check in SwAttrIter::SeekFwd what sort of
attributes you see at the place where things go wrong.

most likely it's possible (for testing) to remove all RSIDs in
the ODF file import by removing the 2 lines with XML_RSID in
xmloff/source/text/txtprmap.cxx; while editing RSIDSs are only
inserted in SwDoc::UpdateRsid.

(there is also a paragraph-rsid but that one cannot cause problems)
Comment 28 Jim Avera 2014-01-13 01:51:21 UTC
Hi, I'm still seeing this LO 4.2.0.1

Each trival text edit creates a new text span with a style which does nothing but give an rsid, even though changes are not recorded and "Use RSID" is not checked in Tools->Options->Writer->Comparison.

This is a problem for anyone who wants to parse the resulting xml file.
In my case, I want to use Perl and ODF::lpOD to search for place-holder strings in a "skeleton" file and replace them with real content.  But the place-holder strings can't be found unless they are stored in a single span.

I agree with the original reporter that generating this kind of complexity goes against the spirit of Open Document Format which is intended to be usefully parseable by applications other than OO/LO.  My suggestion is to not generate the RSIDs unless "Use RSID" is checked in options (and make the checkbox active all the time).

Steps to reproduce:
1. Start LO writer and make sure Edit->Changes has no options checked, and
   Tools->Options->Writer->Comparison has "Use RSID" not checked.

2. Create a new writer doc with just the mis-spelled word "Keyord". Save as .odt.

3. Extract or view the content.xml file from the .odt container
  (linux: unzip -o file.odt && xml_pp -i *.xml && vi content.xml)
  Verify that a single span was used, e.g.

    <text:p text:style-name="P1">Keyord</text:p>

3. Edit the .odt using LO writer and insert a "w" to fix the spelling error, makint it "Keyword". Save and re-extract content.xml

Actual results:
  <text:p text:style-name="P1">Ke
  <text:span text:style-name="T1">w
  </text:span>yord
  </text:p>

Expected results:
  <text:p text:style-name="P1">Kewyord</text:p>
Comment 29 Jim Avera 2014-01-13 01:58:18 UTC
P.S. Copy-and-paste *preserves* the fractured ODF representation!

I was hoping that Control-A, Control-C, Delete, Control-V would be a nifty work-around, but no such luck.
Comment 30 Jim Avera 2014-12-03 20:22:49 UTC
I'm thinking of re-opening this bug, but first want get feedback from any developers listening.   Please reply/advise about whether to re-open this bug.

IMO the underlying problem is not fixed because RSID spans are still created even though "Use RSID" is not checked in Tools->Options->Writer->Comparison.  So LO won't interoperate well with other software due to uncontrollable fracturing of the ODF representation.

(as of LO 4.3.3.2)
Comment 31 Michael Stahl (CIB) 2014-12-03 21:53:31 UTC
this bug is about font kerning. do you have a problem with font kerning?
Comment 32 Jim Avera 2014-12-03 22:56:37 UTC
As the original reporter said, "You may say that it just increases the complexity of .odt files, but is not "wrong" in a technical sense. Well, first, [font kerning sometimes breaks]. And second, this bug unnecessarily increases the complexity and size of .odt files, making the contents unnecessarily hard to read and parse, which is IMHO against the philosophy behind the ODF file format specification -- unlike Microsoft's strange Office 2007 XML format (.docx etc.), which may be intentionally complex to make parsing difficult for foreign software, the ODF file format was designed to be as simple as possible, to make it easy to write parsers and even to allow human beings to read the XML code directly. This is counteracted by this bug."

I gather you would like this aspect (the unnecessary ODF fragmentation) moved to a new bug?
Comment 33 Jim Avera 2014-12-03 23:31:06 UTC
Filed new bug 86988
Comment 34 ccsheller 2017-06-14 10:12:09 UTC
Incorrect comparison of conditions in thints.cxx

if (pItem1 != pItem2 ||
    pItem1->Which() != pItem2->Which() ||
    *pItem1 != *pItem2)