Bug 167851 - Writer: inconsistent field value capture for headers and footers
Summary: Writer: inconsistent field value capture for headers and footers
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
25.2.5.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Writer-Header-Footer
  Show dependency treegraph
 
Reported: 2025-08-07 15:47 UTC by ajlittoz
Modified: 2025-08-26 17:18 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Incorrect value captured for header (27.37 KB, application/vnd.oasis.opendocument.text)
2025-08-07 15:47 UTC, ajlittoz
Details
Rendering on my system (27.53 KB, application/pdf)
2025-08-08 08:37 UTC, Mike Kaganski
Details
Sample file using Liberation Serif (29.02 KB, application/vnd.oasis.opendocument.text)
2025-08-08 09:38 UTC, ajlittoz
Details
Rendering on my Fedora system (22.13 KB, application/pdf)
2025-08-08 09:43 UTC, ajlittoz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ajlittoz 2025-08-07 15:47:53 UTC
Created attachment 202230 [details]
Incorrect value captured for header

Showing first and last item in a page header, such as heading, is common typographical usage.

The "first value" is the one in effect for the "object" at the very top of the page, at left of the first character, the "last value" the one in effect at the very bottom, at right of the last character.

The "object" is usually a variable which can easily be changed by the author. As an example, think of sentence or verse numbering. A *Set variable* field is inserted where this number should appear.

The header is met first in a page. Consequently, we expect fields inside the header to show values in effect at beginning of page. Similarly, the footer is met last and we expect fields there to show values in effect at bottom of page.

The attached sample document exhibit an undocumented behaviour of the capture process.

When the first paragraph, eventually clipped to page break in case it spans several pages, contains at most a single occurrence of the variable, the result is as expected.

However, if this paragraph (or the first to contain occurrences of the variable) has several occurrences of the same variable, the value of the _LAST_ occurrence is reported in the header. This can be explained if the full paragraph is scanned instead of taking into account only the state at first character.

This causes two errors:
- if only one variable occurs in the middle of the paragraph, we get this value, which is not the value in effect at start of paragraph. We're off by one (or worse if the variable is reset).
- if there are several variable occurrences, we get the last one, which is obviously different from the value in effect at start of paragraph.

Assuming the same algorithm is used for the footer at bottom of page in the last paragraph, it always returns the correct value because we expect this latest value.

It is likely that the same algorithm is used for headings because captures are paragraph-based: heading text is the full paragraph text; heading number comes from the associated list style which can only be applied to a paragraph.

The case of variables is different. It should operate at character level because you can have several variable insertion in sequence inside a single paragraph (sentence numbering).

Meanwhile, I could not find any workaround.
Comment 1 Mike Kaganski 2025-08-08 08:13:47 UTC
Clarification for expected / actual observations in attachment 202230 [details]:

All pages have correct footers.

Page 1 shows the expected value in header: 0.

Page 2 shows the expected value in header: 1.

Page 3 shows the expected value in header: 2.

Page 4 shows unexpected value for header: 5 (author expectation is 4). This might need discussion; ajlittoz explains that as "if only one variable occurs in the middle of the paragraph, we get this value, which is not the value in effect at start of paragraph". I don't have a personal view on this - i.e., I can see both the "Only use field, when it's the first element in the first paragraph" idea, as well as "use the first occurrence in the first paragraph". Technically, the latter is easier.

Page 5 shows unexpected value in header: 12 (author expectation is 7). Taking into account the above, it may possibly also be 8.

Page 6 shows the expected value in header: 12.

Setting to NEEDINFO: please provide arguments for page 4 expectation.
Comment 2 Mike Kaganski 2025-08-08 08:29:05 UTC
Also: the bugdoc uses a non-standard font. Its layout depends on the fonts substituted on the system. Please use only standard fonts (Liberation Sans / Serif), when preparing bugdocs, unless the used font is required for reproduction.
Comment 3 Mike Kaganski 2025-08-08 08:37:40 UTC
Created attachment 202237 [details]
Rendering on my system

FTR: this is how I saw the document, when prepared comment 1. As I mentioned, it is good to use the standard fonts; it is also good to start with exact description of the visible problem, and only then proceed to discussion (so my comment 1 netter had been in the beginning of comment 0); and it's good to provide the rendering on the author's system, just in case.
Comment 4 ajlittoz 2025-08-08 09:38:37 UTC
Created attachment 202244 [details]
Sample file using Liberation Serif

Apologies for the non-standard font. I prepared the sample in a rush using my default template.
Comment 5 ajlittoz 2025-08-08 09:43:51 UTC
Created attachment 202247 [details]
Rendering on my Fedora system

Page 1: header 0 (expected: variable not yet defined), footer 1 (expected)

Page 2: header & footer 1 (expected, no change in variable)

Page 3: header 2, footer 5 (expected, simple case, only one variable reference per paragraph)

Page 4: header 11 (expected 6, i.e. first variable value), footer 11 (expected)

Page 5: header & footer 12 (expected, split paragraph case)
Comment 6 ajlittoz 2025-08-08 10:03:51 UTC
(In reply to Mike Kaganski from comment #1)
> Clarification for expected / actual observations:
> > Page 4 shows unexpected value for header: 5 (author expectation is 4). This
> might need discussion; ajlittoz explains that as "if only one variable
> occurs in the middle of the paragraph, we get this value, which is not the
> value in effect at start of paragraph". I don't have a personal view on this
> - i.e., I can see both the "Only use field, when it's the first element in
> the first paragraph" idea, as well as "use the first occurrence in the first
> paragraph". Technically, the latter is easier.> Setting to NEEDINFO: please provide arguments for page 4 expectation.

My expectation for header information retrieval is header reports value defined by the context at beginning of page.

Since "beginning of page" is ambiguous, I'll compare to the heading case available through _Chapter_ field. The first paragraph is taken into account. But is position at left of first character inside or not the first paragraph? If we consider it outside the paragraph, we get the the same context as the one at bottom of left page, i.e. the heading does not exist yet.

So "first" position of page must definitely be considered inside the paragraph. However, this is not sufficient. Scanning the paragraph is necessary to get the number induced by the list style.

So, in case of variable, we must also access paragraph contents. If the variable is located at the very beginning of the paragraph, we're done.

However, if the variable reference is located further inside the paragraph (there is untagged text before the variable), there are two possibilities.

A) Either author really wants the state at beginning of paragraph, then we must retrieve the state of the variable before its change, i.e. same value as at bottom of previous page.

This case can be considered equivalent the the heading case. Suppose we have a short, say 2-3 lines, narrative paragraph, then a heading. The present implementation returns the heading number controlling the narrative paragraph (according to "first paragraph only sccan" principle), though many authors would prefer to show the first "explicit" heading of the page, i.e the one set by the second paragraph. This is presently ruled out, likely for performance reason.

B) Alternatively, with a paragraph scan (necessary at bottom of page to retrieve the LAST variable value), we can get the first variable change, be it in first position or not.

I think there is no general agreement on the choice between A and B. Even the same author may prefer one of them depending on circumstances. For example, if text is short before the variable change, say up to a line and a half, B was be chosen.

But if the paragraph is rather long, with variable change near the end of the paragraph, option A is preferred because most of the paragraph is tagged with the value coming from previous page.

For consistency with heading case, I suggest to retain option A. Proposing choice to user between A and B would be a must, but it is more urgent to fix the bug.

Anyway, IMHO, different algorithms are needed for header and footer instead of a single one as presently.