Bug 118722 - Index formatting has arbitrary italic/roman font changes in many entries
Summary: Index formatting has arbitrary italic/roman font changes in many entries
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: bibisected, bisected, regression
Depends on:
Blocks: TableofContents-Indexes
  Show dependency treegraph
Reported: 2018-07-12 12:24 UTC by Mike Cowlishaw
Modified: 2021-02-10 12:03 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:

Index of current 754 draft, showing mixed italic and roman fonts (247.42 KB, application/pdf)
2018-07-12 12:24 UTC, Mike Cowlishaw
Source .odt 1-page extract + generated index (36.64 KB, application/vnd.oasis.opendocument.text)
2018-07-17 13:11 UTC, Mike Cowlishaw

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Cowlishaw 2018-07-12 12:24:40 UTC
Created attachment 143515 [details]
Index of current 754 draft, showing mixed italic and roman fonts

The IEEE 754 floating-point standard (of which I am Editor) includes an alphabetical index of operations.  While this was correct through late 2017, more recent drafts show font changes in the middle of words in the index, even though (without exception) there are no font changes in the entry in the text (they are all in plain text with no markup within the index words). 

Here is a typical excerpt from the content.xml; the word 'decodeDecimal' only appears once in the .xml file:

  <text:alphabetical-index-mark-start text:id="IMark1055621192"/><text:span text:style-name="T3">decodeDecimal</text:span><text:alphabetical-index-mark-end text:id="IMark1055621192"/>

In the generated Index (created by Tools -> Update -> Update All) this entry appears with 'decodeDec' in italic and 'imal' in Roman in both LibreOffice Writer and the exported PDF (see attached PDF -- the two pages of index).   Neither 'decodeDec' nor 'imal' appear as separate text in the .xml, except in the generated index:

  <text:s text:c="2"/>26</text:p><text:p text:style-name="P411"><text:span text:style-name="Bold"><text:span text:style-name="T2">decodeDec</text:span></text:span><text:span text:style-name="Bold">imal</text:span> <text:s text:c="2"/>26</text:p>

Many other entries have similarly weird italic/roman font changes that correspond in no obvious way to the words in the source document.
Comment 1 Mike Cowlishaw 2018-07-12 12:27:35 UTC
We are hoping to publish around the end of 2018, but will not be able to close the ballot stage while this index problem persists.
Comment 2 Mike Kaganski 2018-07-12 15:31:08 UTC Comment hidden (obsolete)
Comment 3 Mike Cowlishaw 2018-07-12 15:43:52 UTC Comment hidden (obsolete)
Comment 4 Mike Kaganski 2018-07-12 15:53:20 UTC
(In reply to Mike Cowlishaw from comment #3)
> It's a bit difficult to provide the full source as this is an IEEE standard
> under revision and not approved.  In particular it could not be posted to a
> public URL (but I could make it available to an individual privately purely
> for testing -- let me know if that would help).  

Well - in my case, unfortunately no: I wouldn't be able to work on this ATM (I could only reproduce/confirm); so I would be a wrong person. I suspected something like that - that's why I hoped you could anonymize the file, replacing/removing parts, to make some testing document without confidential data, but with the problem.

> Re 6.0 -- I thought I had the latest updates, and checked for those before
> posting this!  Just checked again and again got:
>   LibreOffice 5.4 is up to date.
> Should I be doing something different than 'Help -> Check for Updates'?

Just use the Download page from LibreOffice official site: https://www.libreoffice.org/download/download/. Note also the pre-release versions at the bottom of that page.
Comment 5 Mike Cowlishaw 2018-07-12 16:21:22 UTC
On the example source:

  I'd hoped my inclusion of the XML source was sufficient, but if not I can try and make a cut-down version of the source to reproduce from scratch.  Is there a way to (say) delete pages 1-60 and 62-80?   This is a big document, so select and delete would be extremely tedious... 

On later versions:

  Thanks for that!  I just installed on a different machine, loaded the latest draft and rebuilt the Index.  Problem is still there :-(.
Comment 6 Mike Kaganski 2018-07-12 16:32:28 UTC
(In reply to Mike Cowlishaw from comment #5)
> Is there a way to (say) delete pages 1-60 and 62-80?   This is a big document,
> so select and delete would be extremely tedious... 

Well - still you need to select and delete. Just navigate to page 60; put cursor, and use Shift+Ctrl+Home to select all to the start. Delete. Then go to page 2 (former 62) and use Shift+Ctrl+End to select to end. Or put cursor to start of selection, scroll down using scrollbar, and shift+click to end...
Comment 7 Mike Cowlishaw 2018-07-12 16:35:11 UTC Comment hidden (obsolete)
Comment 8 Mike Cowlishaw 2018-07-17 13:11:47 UTC
Created attachment 143594 [details]
Source .odt 1-page extract + generated index

This is one page of the current IEEE 754 draft showing lots of index entries (and some plain text removed).  The second page is the index generated by Update All, showing the font anomalies.
Comment 9 Mike Cowlishaw 2018-07-17 13:12:58 UTC Comment hidden (obsolete)
Comment 10 Timur 2018-07-18 10:37:54 UTC
Repro with 6.2+ and 5.4.3, not with 5.4.1.
Document can be corrected, just update and save with lower version, like
Comment 11 Mike Cowlishaw 2018-07-18 10:54:42 UTC
Saving with an earlier version isn't really an option because the document approved by the committee was/is saved by the current version.  Also, I have low confidence that pagination would be unchanged, so cannot just use the Index from the old-saved version...
Comment 12 Buovjaga 2018-07-23 15:14:04 UTC
Bisected with win 6.0 repo to https://cgit.freedesktop.org/libreoffice/core/commit/?id=141d4427d2d2db6a16133fcf7571798233a99cb0

tdf#99689 allow Subscript in Illustration Index...
... and Index of Tables.

Adding Cc: to Tamás Bunth

Quick recap:
1. Open attachment 143594 [details]
2. Right-click the index on page 2 and Update index

Nothing happens in current versions.
Comment 13 Mike Cowlishaw 2018-12-02 19:31:44 UTC Comment hidden (obsolete)
Comment 14 Buovjaga 2018-12-03 11:35:17 UTC
(In reply to Mike Cowlishaw from comment #13)
> Not sure what "Nothing happens" means here -- is the problem now fixed? 
> I.e., is it safe to try new version of LibreOffice?  (I am currently on a
> 5.x version and using a script to fix the XML content.)
> Thanks -- Mike

"Nothing happens" is the problem. If you update the index in a version without the problem, the undesired formatting goes away.
Comment 15 Mike Cowlishaw 2018-12-03 11:52:51 UTC Comment hidden (obsolete)
Comment 16 Mike Cowlishaw 2019-01-28 10:51:37 UTC Comment hidden (no-value)
Comment 17 Mike Cowlishaw 2019-01-30 09:49:13 UTC Comment hidden (no-value)
Comment 18 Buovjaga 2019-01-30 10:07:27 UTC Comment hidden (no-value)
Comment 19 Mike Cowlishaw 2019-01-30 13:14:25 UTC
Hi, thanks for the reply -- but that old version paginates differently and line-breaks differently.  So an index generated with the old version will not match the documented edited with a newer version.

I cannot edit the whole document with the old version because that would re-introduce the artifacts that caused the IEEE committee to instruct me to move to newer versions.

I can, for now, continue to semi-manually update the XML after every update (I have a script that does the actual XML changes, but visual inspection of the XML is necessary to determine the 'Tnnn' that has to be changed to 'Bold').

But this must be a very easy error to diagnose .. some piece of code is inserting a tag pair in the middle of a plain-text word.  And I think the version number where it happened is known? 

I would very much appreciate some indication of when this will be fixed; not just for me anymore (I can continue to 'hack it' with my script) but for the next Editor of this standard .. and of course anyone else who is having the same problem.
Comment 20 QA Administrators 2021-02-05 04:11:18 UTC Comment hidden (obsolete)
Comment 21 Mike Cowlishaw 2021-02-10 12:00:31 UTC
Happy to report that the prblem has been fixed in

Comment 22 Mike Cowlishaw 2021-02-10 12:03:10 UTC
Happy to report that this problem is fixed in

Many thanks.