Bug 74056 - EDITING: Index Quirks
Summary: EDITING: Index Quirks
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.0.alpha0+ Master
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks: TableofContents-Indexes
  Show dependency treegraph
 
Reported: 2014-01-25 19:21 UTC by Frank
Modified: 2019-01-26 13:16 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank 2014-01-25 19:21:31 UTC
I'm using Writer Version: 4.3.0.0.alpha0+ Build ID: be4035d00f37c492494fa7860955b6d0868c7f77 on Ubuntu 12.04 LTS 64 bit Linux, and this is regarding the Index feature:

I know the table of contents feature seems to work well, as I've used it quite a bit in the past.
But creating an alphabetical index (a first for me, at least using Writer) seems to have a number of rather annoying features. I've provided an experiment you can do at the end, but first let me discuss the issues.

1) When creating an alphabetical index, the Columns feature works as expected, BUT ONLY FOR THE STANDARD (DEFAULT) PAPER SIZE! If I attempt to use columns on a 6" x 9" page, Writer still seems to assume a "standard" paper size and doesn't seem to know that I'm using a 6" x 9" page. Thus no matter how I try to tweak the setup, the right column is almost completely off the page on the right. So I can't use columns for the Index - Bummer, but not the end of the world.

2) If I go through and highlight each entry that I want indexed, everything works great (so far as I can tell), and I have the options for "match case" and "whole words only." But in a long document, using a concordance file certainly seems to make more sense, and it SEEMS TO WORK, but it actually doesn't.

  Issue a) No matter what boxes are checked, Writer goes and marks ALL instances of whatever the concordance entry has: regardless of whether or not it is a whole word or whether the capitalization matches.

  Issue b) If you add another entry to the concordance file and "Update Index/Table," all instances (even incorrect ones) of the new entry are indeed marked within the file and added to the index, but every earlier entry GETS AN ADDITIONAL marker. As I modify the concordance file to add new items and update the index, I find that the earliest entries have as many markers as the number of updates I've done. 

Here's an experiment you can do:

Open a new document using whatever standard size is in effect.
At the top of the document, type dt and press F3. This generates the dummy text.
In a separate text editor create a concordance file with the following entries:

Breeze;Breeze;;;0;0
Long;Long;;;0;0
Self;Self;;;0;0
Wrist;Wrist;;;0;0

Back in the document, create a new index at the bottom of the page, and mark "Case Sensitive." Then choose the concordance file created above.

You'll see that the words "himself" and "along" are marked in the document and included in the index as "Self" and "Long", even though these words never appear in the text. Now add the following entry to the concordance file:

Eat;Eat;;;0;0

Find it? It actually marked the last three letters of the word "sweat" which doesn't match the case.

In a three hundred plus page document with a surname index this tends to make the index pretty useless, since it's cluttered with erroneous entries which will drive you nuts looking on a referenced page to find something that just isn't there.

Sadly, I've obviously done that, which brings up another issue:

How do I get rid of the index markers? Going through one by one is far too tedious. I recall that the alternate search and replace had a selection for "index" under properties, but the I downloaded that and the latest version seems to have a serious bug which doesn't let you get to a specific entry on the drop down list for "properties" so that's out. And I don't know if that would be what I need anyway.

Any help would be appreciated.

Operating System: Ubuntu
Version: 4.3.0.0.alpha0+ Master
Comment 1 Tomaz Vajngerl 2014-03-12 19:09:30 UTC
oh wow.. I actually learned something trying to reproduce this :) 

Confirming..
Comment 2 Frank 2014-03-12 20:16:43 UTC
Hi Tomaz:

Always glad to aid in someone's education !!!

Seriously, if I can contribute to testing any "fixes" to these things (I believe there may be several contributing but not necessarily related bugs involved), let me know and I'll try to help. Obviously, since I reported the issue, I've become familiar with its use.

Also be aware that I also provided the following instructions to someone on some forum that I can't remember now to allow them to remove all the index markers, and this might help whoever is tasked with working on the index issues:

===== QUOTE =====
So, here's how to remove all of the index markers from a Writer document so you can start with a clean slate. To do this, you will need to be running LibreOffice on some flavor of Linux/Unix, or at least on a system that has a command line or some text editor with "sed" capabilities.

1: Make a backup of your Writer document. You know the consequences if something goes amiss.
2: Open the document in Writer, and choose Save As "OpenDocument Text (Flat XML) (fodt)"
   This creates an uncompressed XML version of the document.
   On my system (Ubuntu), I was unable to decompress the odt version, as the OS complained it was malformed.
3: Close the document and exit Writer.
4: Open a command line shell, preferably in the directory containing the fodt file.
5: Run the following command (all one line - broken apart here for clarity):
   sed 's/<text:alphabetical-index-mark text:string-value="\([A-Za-z]*\)"\/>//g'
   < Old_File_Name_and_Path.fodt
   > New_File_Name_and_Path.fodt
   Depending on the file size and processor speed, this may take a bit.
   If this gives errors, you're on your own.
6: Close the command line shell.
7: Open the new "cleansed" fodt file with Writer.
8: The file should look the same but without any alphabetical index markers. (The formatting is still there, though)
9: Go to where your alphabetical index is located, right click and select "Update Index/Table"
A: All of the index entries should disappear; if any remain, go find them and manually delete them.
   Apparently, some of the indexes are somehow embedded in others and aren't found by the sed command above.
   I didn't bother to try figuring out how or why that happened. I had several hundred markers, of which only five weren't removed.
B: Now, go back to the index and select Edit Index/Table, then File | Open.
C: Select the original file (assuming you have it where you want it), and let Writer go to it.
D: You now have a "clean" document with no duplicate index entries.
E: LOOK AT IT CAREFULLY, of course, before replacing your original. The document I tried this on was over four hundred pages with lots of tables, graphics and so forth, and I found no problems, but it's up to you to determine if everything is ok.

I hope this helps any others who might be using alphabetic indexes.
===== END QUOTE =====

Best of Luck,
Frank
Comment 3 QA Administrators 2016-02-21 08:36:26 UTC Comment hidden (obsolete)
Comment 4 Frank 2016-02-21 15:04:13 UTC
I received the auto-generated request to "retest open, confirmed bugs." As the original bug reporter, I felt as if I should respond. Surprisingly enough, the behavior I described in the original bug report seems to have been effectively "hidden" by a new bug!!

My original instructions for duplicating the bug (in the "Description") call for the use of a concordance file, which I duly (re)created. Then (in Writer, and after creating the dummy text) I used the menu options:
     "Insert | Indexes and Tables | Indexes and Tables..."

In the dialog that appears, change "Type" to "Alphabetical Index," then check the box "Concordance file."

Then choose "Open" from the "File" drop down box. Unlike the expected behavior when I originally submitted the bug, no files at all were listed. I assumed I might have been pointing to the wrong directory, but no!

Poking around, I realized that only directories/folders appeared in the display: no actual files of any type were listed no matter how I traversed the file system. (Aside: this suggests to me that this behavior might possibly be operating system dependent, so someone else might want to attempt repeating my instructions on a Windows machine.)

Click on Help, and a web page appears that says:

**QUOTE**
Swriter/modules/swriter/ui/tocdialog/showexample

There is currently no text in this page. You can search for this page title in other pages, or search the related logs, but you do not have permission to create this page.
**END QUOTE**

Even so, the page title would suggest that this page is for "Table of Contents" (which seems to still work) rather than "Index" so that likely wouldn't have had an answer anyway.

So, strictly speaking, I suppose this bug can be closed by the LibreOffice QA Team since it can no longer be duplicated!

FWIW: Here's my current LO setup:
Operating System: Ubuntu 14.04.3
Writer Version: 5.0.5.2; Build ID: 1:5.0.5~rc2-0ubuntu1~trusty1
Comment 5 QA Administrators 2017-03-06 15:29:54 UTC Comment hidden (obsolete)
Comment 6 Roman Kuznetsov 2019-01-26 13:16:33 UTC
Olivier, Xisco, please look at Comment 4