Originally reported here: http://www.openoffice.org/issues/show_bug.cgi?id=27377 See original bug report for description.
Cedric, time for a little eval - can this be cast into an (relatively) easy hack (telling from the macro in the OOo issue)?
A nice starting point to hack on this would be http://opengrok.go-oo.org/xref/writer/sw/source/ui/index/toxmgr.cxx#UpdateOrInsertTOX I think this could be an almost easy hack.
That's a really important feature (currently for technical sheets I see big problems with automatic TOC), but I doubt that it's an "EasyHack". There are good reasons to define formatting in styles templates for the tables and indexes, and we should think for each style Item separately whether it should be taken in the TOC lines. This discussion should have to be finished before work on code can start, I belive a draft for a specfication should eb done in the WIKI
Created attachment 41135 [details] See Comment 3
(In reply to comment #3) > That's a really important feature (currently for technical sheets I see big > problems with automatic TOC), but I doubt that it's an "EasyHack". > > There are good reasons to define formatting in styles templates for the tables > and indexes, and we should think for each style Item separately whether it > should be taken in the TOC lines. This discussion should have to be finished > before work on code can start, I belive a draft for a specfication should eb > done in the WIKI Agree. There was a macro for OOo 2.0 that modified the TOC to recreate the manual formatting, but that macro does not work on 3.x and, first of all, it was a nasty hack. A proper solution is much needed. In fact, the ability to not only accept, but also to _ignore_ some formatting (line breaks to create two line headings comes to my mind) will be very important.
(In reply to comment #3) > That's a really important feature (currently for technical sheets I see big > problems with automatic TOC), but I doubt that it's an "EasyHack". > > There are good reasons to define formatting in styles templates for the tables > and indexes, and we should think for each style Item separately whether it > should be taken in the TOC lines. This discussion should have to be finished > before work on code can start, I belive a draft for a specfication should eb > done in the WIKI I have added it to the /Development/Easy_hacks page, under "Slightly more interesting hacks" to reflect its difficulty. I've put forward my thoughts re: individual style items there for discussion (if indeed I understood what you meant by style items; I'm not even sure what's meant by "table lists")
http://opengrok.libreoffice.org/xref/writer/sw/source/core/doc/doctxm.cxx#1248 This seems to be where the text of the heading is pulled from the document and put into the TOX (for TOX_OUTLINELEVEL anyway): SwTxtNode* pTxtNd = rOutlNds[ n ]->GetTxtNode(); <snip> SwTOXPara * pNew = new SwTOXPara( *pTxtNd, sSwTOXElement::TOX_OUTLINELEVEL ); InsertSorted( pNew ); I'm still taking baby steps into the code but is the problem that SwTxtNodes lost their formatting? Solution to get a SwCntntNode instead and perform suitable style cleanup manually? Is this anywhere near the right track?
(In reply to comment #7) > http://opengrok.libreoffice.org/xref/writer/sw/source/core/doc/doctxm.cxx#1248 > > This seems to be where the text of the heading is pulled from the document and > put into the TOX (for TOX_OUTLINELEVEL anyway): > > SwTxtNode* pTxtNd = rOutlNds[ n ]->GetTxtNode(); > <snip> > SwTOXPara * pNew = new SwTOXPara( *pTxtNd, sSwTOXElement::TOX_OUTLINELEVEL ); > InsertSorted( pNew ); > > I'm still taking baby steps into the code but is the problem that SwTxtNodes > lost their formatting? Solution to get a SwCntntNode instead and perform > suitable style cleanup manually? Is this anywhere near the right track? You got to the TOX code which is great... but I don't think it's that the nodes lost their formatting. You may want to have a look at this method as it generates the whole TOX entry: http://opengrok.libreoffice.org/xref/writer/sw/source/core/doc/doctxm.cxx#1599 I had a quick look, but it sounds like the text is simply copied from the source text node to the TOC node... without the formatting attributes.
I'm not entirely convinced by this. Would you really want text from a<heading 1> in the TOC in 19 point bold? To make this universally useful, each selectable style would need to carry two formats - one for use inline, and one for cross-references (e.g. toc, index, cross-references.. The second could be "as in-line but font=12,nobold" sort of thing, or a complete definition.
(In reply to comment #9) > I'm not entirely convinced by this. Would you really want text from a<heading > 1> in the TOC in 19 point bold? > > To make this universally useful, each selectable style would need to carry two > formats - one for use inline, and one for cross-references (e.g. toc, index, > cross-references.. The second could be "as in-line but font=12,nobold" sort of > thing, or a complete definition. No, I just want to preserve sub and super scripts for example, and possible italics and bold applied on top of the text, i.e. everything that is NOT part of the heading paragraph style.
Comments from the EasyHack page: Character formatting not retained in entries of TOC, table lists, etc. Background: All available in the following bug reports: LibreOffice bug 30732 OpenOffice.org bug 27377 This issue has been around since 2004. it was suggested that discussion should move here to hash out requirements. Please do so here, with commentary. Paragraph Styles should not be respected - obvious, really. Character Styles should be respected Manual formatting should be respected This attachment to the OpenOffice.org bug shows that the problem exists in contexts other than indexes / tables of contents. Therefore maybe the link given in this comment on the LO bug isn't the root of the problem after all? What other complications are there? Skills: C++
[This is an automated message.] This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it started right out as NEW without ever being explicitly confirmed. The bug is changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases. Details on how to test the 3.5.0 beta1 can be found at: http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1 more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
An EasyHack should have been checked by developers and thus is confirmed regardless of age. Moving back to NEW from NEEDINFO again. Sorry for the hassle.
Lots of votes for this bug over at Apache, and for good reason. At the office I get complaints from co-workers about this as well. In our case we mainly use subscripted symbols in paragraph headers. These subscripts show up unformatted in the TOC. So, from our point of view, it would be fine to just retain any character formatting in the TOC.
(In reply to <a href="show_bug.cgi?id=30732#c5">comment #5</a>) > In fact, the ability to not only accept, but also to _ignore_ some > formatting (line breaks to create two line headings comes to my mind) > will be very important. I am not sure I agree with this however it does highlight a difficult aspect to the issue. Any multi-line heading containing Unicode Line Breaking Algorithm properties, such as Mandatory Break (BK) http://unicode.org/reports/tr14/#BK Non-breaking (GL) http://unicode.org/reports/tr14/#GL Word Joiner (WJ) http://unicode.org/reports/tr14/#WJ should have those properties respected (mirrored) in the TOC. The difficulty, as I understand it, is that the TOC is simply a form of x-ref and some of these properties (e.g., BK) would not be appropriate in an in-text x-ref. (In reply to <a href="show_bug.cgi?id=30732#c11">comment #11</a>) > Manual formatting should be respected This is essentially the main problem I have tried to illustrate. The aspect of respecting character styles (superscript, italic, etc.) would seem less an issue.
Deleted "Easyhack" from summary.
Version 3.6.2.2 See style "Contents 2" -> Modify -> Tabs -> Type Right. Update Index/Table Paragraph -> Tabs ->Type Left! Must be "Right". Apply a style "Contents 2", Paragraph -> Tabs ->Type Right! Update Index/Table, alignment again "Left"...
As it can be seen, this error/bug/misconception was found early in 2004. It's very disappointing that we're in 2013, counting many votes for it, specially from people who writes technical papers, and we still have no real effort put on. It is simple to obvious that if we have to write H<sub>2</sub>SO<sub>4</sub> as the sulfuric acid formula, "H2SO4" simply doesn't fit in a technical paper. I think the point is just to keep superscript, subscript, underline, italic and bold character formatting. Or am I missing the point?
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility. see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
(In reply to comment #18) > I think the point is just to keep superscript, subscript, underline, italic > and bold character formatting. Or am I missing the point? I think that's nearly true, but I would generalise to keep /all/ immediate formatting applied over and above the original text's intrinisc style. so perhaps the casting of a single character into a different font for some reason might be included too, or perhaps the modification of text colour for some bizzare reason.
Restricted my LibreOffice hacking area
*** Bug 75021 has been marked as a duplicate of this bug. ***
I am not 100% clear on which formats should be retained. I am currently retaining super and sub-script, because this is the most important thing for anyone working in sciences and a major problem for anyone in this field trying to write a text with libre office. If more complex logic is required, I would suggest to track this in a separate bug.
(In reply to comment #23) > I am not 100% clear on which formats should be retained. I am currently > retaining super and sub-script, because this is the most important thing for > anyone working in sciences and a major problem for anyone in this field > trying to write a text with libre office. > > If more complex logic is required, I would suggest to track this in a > separate bug. Please also retain italics. This is useful when preparing a TOC of legal precedents cited by a brief, where names of cases should be italicized.
*** Bug 41111 has been marked as a duplicate of this bug. ***
(In reply to comment #23) > I am not 100% clear on which formats should be retained. I am currently > retaining super and sub-script, because this is the most important thing for > anyone working in sciences and a major problem for anyone in this field > trying to write a text with libre office. > > If more complex logic is required, I would suggest to track this in a > separate bug. Yes, as Daniel suggests, italics are important too if your headings contain words in foreign languages and latin phrases (e.g. "Indonesia's policy of /Konfrontasi/"), document titles (e.g. "Sports in /Pravda/ (1960-1965)"), or you just need some kind of emphasis (e.g. "On why we /need/ italics"). And in lists of tables, illustrations, etc., this is surely much more common. I'm sure that must be the case also in the sciences.
Tobias Lippert committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=9088a4c2d18f59c22fceb81829441b704603415d fdo#30732 Retain selected character attributes for table of contents The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Confirmed fixed for subscript, superscript and italics. "selected character attributes" means no bold, underline, font changes, font colours and Unicode Line Breaks, right? This will probably not satisfy Owen Genat (see comment 15), but it will do for my use cases (engineering / science). Nice work, Tobias! I vote for VERIFIED FIXED. Tested on OpenSuSE 12.3 (64-bit) LOdev version: 4.4.0.0.alpha0+ Build ID: 8b499cea76577b4221fccb17703aa9e86b625e90
(In reply to comment #28) > Confirmed fixed for subscript, superscript and italics. Confirmed for these forms of direct formatting for Table of Contents entries only. These forms are not yet supported: - Character styles using these characteristics (e.g., the pre-defined Emphasis style or custom styles using super/subscript) in ToC entries. - Illustration Index or Table Index entries (using direct formatting or a character style). - Footnote anchors (small superscripted identifier) in Table of Contents, Illustration Index, or Table Index entries. - Cross-references to any Heading, Caption, Bookmark, or Reference mark that include these forms of direct formatting or character style. > "selected character attributes" means no bold, underline, font changes, font > colours and Unicode Line Breaks, right? This may be a reference to not only a limited sub-set of characteristics, but also those applied only via direct formatting. > This will probably not satisfy Owen Genat (see comment 15) :^) That comment was mainly to point out the intricacies involved (as I saw them), particularly in relation to line breaking situations (which are likely out of scope). Italic+superscript+subscript is a good start, but I do feel it would be good if the points listed above as unsupported were included. Both the Apache issue and this bug cite other forms of index. Character styles / cross-references would seem an unfortunate omission. We do now however have a workaround. > Nice work, Tobias! I vote for VERIFIED FIXED. Well done from me also, although I am more hesitant on calling it fixed. Tested under Crunchbang 11 x86_64 running v4.4.0.0.alpha0+ Build ID: 3fdd4f069d5436cf39708004af7fda8175fbc4c2
@Stephan van den Akker > no bold, underline, font changes, font colours correct > and Unicode Line Breaks, right? I have not changed the logic for handling whitespaces. The source code has a check for '\n' in it. (ToxWhitespaceStripper.cxx:25) I have verified that linebreaks with shift+enter do not appear in the Table of Contents. However, if there is a method which operates on sal_Unicode and detects whitespaces, it should be used here instead. However, I could not find one.
I'm not sure if the fix has already included this, but I would really love if the TOC could transfer *highlighting* from headings to the index.
Hello Harry, I have explicitly included only a few selected character formats. There might be users who would rather not have the highlights in the table of contents. Unfortunately, I do not know who decides which formats should be applied in the table. If you find out, and get a positive feedback, I can help to add the functionality. :-) Tobias
(In reply to Tobias Lippert from comment #32) > Hello Harry, > > I have explicitly included only a few selected character formats. There > might be users who would rather not have the highlights in the table of > contents. > > Unfortunately, I do not know who decides which formats should be applied in > the table. If you find out, and get a positive feedback, I can help to add > the functionality. :-) > > Tobias Hi Tobias, Thanks for your response! Yeah I understand that transferring the highlighting wouldn't be ideal for everyone - I thought that as it's the default behaviour on MS Word it wouldn't be too controversial, but it's definitely possible that people might get up in arms about it. I'm not really involved in the LO community so wouldn't know how to gauge opinions on this... I know it would probably be a lot more work, but I guess the ideal situation would be for each sort of character formatting to be able to be turned on and off in the settings of the TOC index. Or, alternatively, you could have a check box to turn on and off the transfer of *all* formatting. Just a thought ;)
*** Bug 88046 has been marked as a duplicate of this bug. ***
Migrating Whiteboard tags to Keywords: (easyHack difficultyInteresting skillCpp) [NinjaEdit]
A polite ping, are you still working on this ?
Superscripts and subscripts now work in recent versions.
(In reply to Frederic Parrenin from comment #37) > Superscripts and subscripts now work in recent versions. Just in TOCs, not in fields. TOC now works wonderfully (thanks!!!), but if you insert a cross reference or use a chapter field for your headers or footers, the problem persists (tested on 5.1). Do I need to fill a new issue for fields or can we continue to use this one?
Hello - I thought this was only about the TOC, and that this issue was fixed. @RGB Can you provide a simple test file? (Just to see if we mean the same thing.) I will check if my fix can easily be ported to other fields.
Created attachment 122738 [details] Test file with fields that do not keep formating (In reply to Tobias Lippert from comment #39) > Hello - I thought this was only about the TOC, and that this issue was fixed. > @RGB Can you provide a simple test file? (Just to see if we mean the same > thing.) I will check if my fix can easily be ported to other fields. Sure, here it is. It contains a TOC, a formatted heading and two field, one a cross reference to the heading and other (on the page footer) a Chapter field.
Tobias@ are you still working on this bug (otherwise please unassign it) ?
Unassigned.
(In reply to Tobias Lippert from comment #42) > Unassigned. setting status to new, unassigne Tobias.
Closing this bug, since it works in TOC. It seems to be open for fields, but there are no code pointer and no mentor.
I just wanted to add that Small Caps formatting in a heading does not carry to the TOC. Should I open a new issue for this, or should this one get reopened?
Notes for unit test writers: Tests for subscript and superscript were added in 1ca2a2119ad3e910f848344d51ba9ec173880715, so the only remaining thing to test is italics. Revert has to be done manually.