Bug 86988 - RSIDs should not be created unless enabled in Options
Summary: RSIDs should not be created unless enabled in Options
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: Other All
: low major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 99015 (view as bug list)
Depends on:
Blocks: ODF-export-invalid
  Show dependency treegraph
 
Reported: 2014-12-03 23:28 UTC by Jim Avera
Modified: 2017-05-31 20:22 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Instructions in comment #4 (12.23 KB, application/vnd.oasis.opendocument.text)
2016-05-31 19:23 UTC, Jim Avera
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Avera 2014-12-03 23:28:45 UTC
Each trival text edit creates a new ODF text span with a style which does nothing but give an RSID, even though changes are not recorded and "Use RSID" is not checked in Tools->Options->Writer->Comparison.

Even trivial spelling corrections break words into multiple spans, which makes the resulting ODF file virtually unreadable to either humans or most other software.  For example, I use Perl and ODF::lpOD to search for place-holder strings in a "skeleton" file and replace them with real content.  But the place-holder strings can't be found unless they are stored in a single span.  The RSID problem makes it almost impossible to use Libre Office to edit the skeleton doc.

Please see bug 52028 for more details and some test cases.

(that bug reported a font-kerning issue which has since been fixed, but the part about uncontrollable ODF fragmentation remains).

I'd like to echo the sentiments of the reporter of that other bug, who said
"this bug unnecessarily increases the complexity and size of .odt files, making the contents unnecessarily hard to read and parse, which is IMHO against the philosophy behind the ODF file format specification -- unlike Microsoft's strange Office 2007 XML format (.docx etc.), which may be intentionally complex to make parsing difficult for foreign software, the ODF file format was designed to be as simple as possible, to make it easy to write parsers and even to allow human beings to read the XML code directly. This is counteracted by this bug."

PROPOSED SOLUTIONS:

1.  Don't generate RSID spans unless "Use RSID" is checked in ools->Options->Writer->Comparison (which currently requires selecting by-char or by-word comparison mode) AND edit->Changes->Record is checked.

OR

2.  Provide a new command ("Simplify ODF" or somesuch) which removes all RSIDs and then removes all empty spans. This could be used by people who need to generate ODF files which can be parsed by other software.   I predict that if this command is implemented, someone will want a static option to do it always; one place for that would be a Save-As option.
Comment 1 Urmas 2014-12-03 23:52:13 UTC
If the only way to use RSIDs in ODF document is creating new span styles, the format is deficient.
Also, ODF files are not intended to be read by humans.
Comment 2 Zhivko 2015-04-07 13:32:56 UTC
+1 for that - is there any setting in 4.4 to control RSID behaviour somehow? This is really anoying.
ODT should be kept as simple as possible...
Can yomebody look at this?
Comment 3 tommy27 2016-04-16 07:25:18 UTC Comment hidden (obsolete)
Comment 4 Jim Avera 2016-05-31 19:21:34 UTC
Problem is still there in  5.2.0.0.alpha0+

Now, however, Tools->Options-Writer->Comparison no longer even has an option to enable RSIDs, and they seem to be permanently enabled. 

This problem really striked at the heart of the philosophy of why ODF was created -- to make a format which was simple and to facilitate different tools interoperating.  As it is, the pervasive injection of RSID-related styles make interoperation with other software very difficult (e.g. the ODF::lpOD library is almost useless because of this).

To make things more clear, here is an example of

STEPS TO REPRODUCE:

1.  Open the attached "rsidtest.odt"
2.  Replace the capitalized word "THIS" in the first sentence with a lower-cased version (or may any other "spelling corrections" you like).
3.  Save and exit LO
4.  mkdir tempdir
    cd tempdir
    unzip -q ../rsidtest.odt
    xml_pp -i -s nsgmls *.xml  # makes them easier to read
    
5. Examine "content.xml" with vim or similar
   Search for "Replace this word" (the corrected sentence in the document)

RESULTS: The search will not succeed because of the injection of RSID-related styles for each of the fragments around the edit.

EXPECTED RESULTS: Unless RSIDs are enabled, the "Replace this word ..." sentence should be in a single ODF object.   In general, ODF objects should not be fragmented unless there is a visible or functional reason to do so.
Comment 5 Jim Avera 2016-05-31 19:23:04 UTC
Created attachment 125423 [details]
Instructions in comment #4
Comment 6 Regina Henschel 2017-05-26 11:50:07 UTC
*** Bug 99015 has been marked as a duplicate of this bug. ***
Comment 7 Regina Henschel 2017-05-26 11:55:34 UTC
With current version of 5.2 you can disable writing RSIDs in Tools > Options and that works. But writing random numbers is on as default.
Comment 8 Thomas Lendo 2017-05-31 20:22:00 UTC
Jim, I tested the steps in your comment 4. Before that I deactivated Tools > Options > LibreOffice Writer > Comparison > Store it when changing the document.

There is no RSID span in the file after the test change.

Version: 5.5.0.0.alpha0+
Build ID: b08217989558addbcaded122a4e7211ae24bbcff
CPU threads: 4; OS: Linux 4.8; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-05-31_06:36:03
Locale: de-DE (de_DE.UTF-8); Calc: group

It also works in 5.1.6.2.

I close this bug as WORKSFORME.
If that isn't correct, please reopen the bug as UNFONFIRMED.
If you want that option per default deactivated in LibO, please open a new bug for that.