Bug 65323 - Objects anchored as characters aren't treated as characters
Summary: Objects anchored as characters aren't treated as characters
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0.2.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 105770 (view as bug list)
Depends on:
Blocks: OLE-Objects
  Show dependency treegraph
 
Reported: 2013-06-04 01:12 UTC by Mike Kaganski
Modified: 2020-03-12 07:30 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2013-06-04 01:12:31 UTC
When an object is anchored as character, it should behave like it is a character. Currently, there are a number of issues:

1. If it is enclosed in parentheses, then it may be wrapped to the next line _without_ the parenthesis, or the parenthesis without the object (no space between parenthesis and the object).

To check, create a new document, type opening and closing parentheses, insert a formula object with arbitrary contents inside these parentheses, then place caret before opening parenthesis and start typing something until the formula wraps to the next line. Note that the closing parenthesis goes there alone without the formula first, then the formula wraps without the opening parenthesis.

2. If it is preceded or followed by a punctuation character, then syntax check shows improperly placed punctuation (as if there is no object; like needless space before dot or comma).

The latter is of low importance, while the former may lead to wrongly formatted document. Thus marking as Normal.
Comment 1 Mike Kaganski 2013-08-12 11:56:28 UTC
Another manifestation of this problem:
if I search for trailing spaces using this regex:
[[:space:]]$
then it finds all lines that have objects anchored as characters after a space in the end of a line:

[something][space][obj-anchored-as-character][eol]

This makes it impossible, e.g., to trim trailing spaces.
Comment 2 Dominique Boutry 2013-10-11 16:55:07 UTC
It seems to me that the bug is unique : the one related in comment 1.

Description :
- when you say "When an object is anchored as character, it should behave like it is a character", does that imply for you "an alphabetical character" ? i.e. something that do not cause a word break ?
- OpenOffice apparently endorses that an object behave like a word-breaking character rather than en alphabetic character ; so the point 1 is not a bug,
- both solutions are defendable ; however, if you think that the string "(<formula>)" must be processed like a word (for the purpose of wrapping) so that "anchored as character" must now be understood as "anchored as alphabetical character", I'm afraid that it raises new bug issues like "my sentence <sentence_body><large_image_object> does no longer wrap correctly when <sentence_body> ends with a letter : the wrapped image always pulls the last word with her..."
- in description point 2 : could "... improperly placed punctuation (as if there is no object..." be replaced by "properly placed punctuation (because for the sake of wrapping the object is processed as a non-letter...".
- may I suggest that you include the parenthesis into the formula ?

Comment 1 :
- I confirm that "<text><blank><object><paragraph_mark>" matches "[[:space:]]$",
- This fact can't be summarized as "... there is no object..." because "<space><object><space>" doen't match "<space><space>" (either with and without the Regular Expression box checked)
- also this text "<space><object><space>" doesn't match "[[:space:]][[:space:]]" but it matches "[[:space:]].[[:space:]]"
- so the bug is : inconsistencies in the way the pseuso-character representing an object anchored as character is used anywhere (word wrapping, searching, etc). A consistent behaviour should be proposed, closest to the current behaviour.
Comment 3 Mike Kaganski 2013-10-12 13:18:24 UTC
(In reply to comment #2)
> - OpenOffice apparently endorses that an object behave like a word-breaking
> character rather than en alphabetic character ; so the point 1 is not a bug,
> - both solutions are defendable
You are absolutely right! I must have thought about this myself.

> however, if you think that the string
> "(<formula>)" must be processed like a word (for the purpose of wrapping) so
> that "anchored as character" must now be understood as "anchored as
> alphabetical character", I'm afraid that it raises new bug issues like "my
> sentence <sentence_body><large_image_object> does no longer wrap correctly
> when <sentence_body> ends with a letter : the wrapped image always pulls the
> last word with her..."
You are right; however, the current state only allows for one thing: unconditionally wrapping the object. The proposed change allows one to control this: simply add the spaces around objects that need wrapping, and everything is OK; otherwise, if you need object to stay with leading/trailing text, it is doable by omitting the space.
This is likely to break layout of some existing documents. This may be mitigated either by automatically inserting spaces (if absent) around such objects on importing documents created by older versions (this needs knowledge of version of creator software), or by introducing a new option selectable by user document-wide or per object.

> - may I suggest that you include the parenthesis into the formula ?
No :) Actually, this is a wrong approach. Consider this text:
<some text> (for example, <formula object>).
Here, if I put right closing parenthesis into the formula, I run into a number of issues:
1. I must use the syntax "right none ... left )" inside the formula to enter only one parenthesis, and it is the scaling parenthesis, so it will likely not match the size of opening parenthesis written in plain text;
2. Even if I manage to enter one closing parenthesis of correct size (by enclosing it in quotes in formula), it will likely be of a different shape (by different font);
3. Even if I define the same font for the formula (which may be undesirable, to keep visual distinction of formulas), I will depend on correct vertical alignment of the parenthesis (which is not always that correct), so the parenthesis will likely be misaligned.
4. The resulting text will fire warnings from spell checker.
And maybe there are some other issues I haven't thought of.

> - I confirm that "<text><blank><object><paragraph_mark>" matches
> "[[:space:]]$",
> - This fact can't be summarized as "... there is no object..." because
> "<space><object><space>" doen't match "<space><space>" (either with and
> without the Regular Expression box checked)
> - also this text "<space><object><space>" doesn't match
> "[[:space:]][[:space:]]" but it matches "[[:space:]].[[:space:]]"
> - so the bug is : inconsistencies in the way the pseuso-character
> representing an object anchored as character is used anywhere (word
> wrapping, searching, etc). A consistent behaviour should be proposed,
> closest to the current behaviour.
I agree. And I think that the correct behavior of search and replace should treat such objects as a character that can be matched by regexes like ".", "[^[:space:]]", "[^[:alpha:]]", "[^[:digit:]]", "[[:print:]]" (because it's printable object), "[^[:cntrl:]]", "[^[:lower:]]", "[^[:upper:]]". If so, lines like <text><space><object> will not match regex [[:space:]]$

I think that this issue should be split to two (one bug report about regexes and one RFE).
Comment 4 Dominique Boutry 2013-10-14 07:26:04 UTC
I'm trying the sequence "<text><space>(<nBS><Formula><nBS>)<endOfParagraph>"
where <nBS>=<nonBreakingSpace>

The second <nBS> seems to failed, I manage to get:
"<longText><space>(<nBS><Formula><nBS><EOL>
)<endOfParagraph>"

However, the following works well :
- "<longText><space>(<nBS>xxxx<nBS>)<endOfParagraph>" (<EOL> possible before "(" and after ")")
- "<longText><space>(<nBS>++<nBS>)<endOfParagraph>" (<EOL> possible before "(", between the "+" and after ")")
- "<longText><space>(<nBS><nBS>)<endOfParagraph>" (<EOL> possible before "(" and after ")")

Could it be another evidence of the bad qualification of the <formulaAnchoredAsCharacter> for breaking ? If that point was corrected, may the use of <nBS> satisfy your need ?
Comment 5 sergio.callegari 2014-05-12 16:17:48 UTC
Same issue here with 4.2.4 on linux.

Writing a scientific document, I have formulas that are graphic objects anchored as characters (made with the texmaths extension).

Unfortunately, these objects anchored as character are mistreated. Whenever I have a comma following one of this objects, the comma can often go alone after a line break.

This is like

text text text text formula
, text text text

Marking the bug as 'new' from 'unconfirmed' since it is now seen by more than a single user.

Switching from platform Windows to platform all, since I am seeing this on Linux and I am pretty sure that the issue is cross-platform.
Comment 6 sergio.callegari 2014-05-12 16:28:51 UTC
An addition to Dominique Boutry:

I strongly recommend one of the following alternatives:

1) treating objects anchored as characters as alphabetical characters

Rationale: this would be much better than the current situation. In the current situation there are no workarounds for the broken behavior

text text text <formula>
, text text text

putting the comma in the formula is not OK, since the comma would be in a math font rather than in the correct text font.

Conversely the issue of,

text<image>

not breaking can be worked around by simply inserting an invisible (thin) space between text and image.

2) introducing a no-break flag (or treat as alphabetical character flag) into objects anchored as character. This may not be fully conformant with the ODF spec, but would simply be ignored by other odf apps not understanding it, so it would not cause any significant breakage

3) introduce a 'glue' formatting mark, like a non-breaking space that takes no space.

Personally, I prefer 1, which I think is the solution causing less trouble. In any case, I believe that a solution needs to be found, because as is the behavior makes LibO unsuitable for scientific documents.  Technical documents, where small graphics are often used to represent keys on a keyboard or operator actions, are likely to represent another situation for which the current behavior makes LibO unsuitable.
Comment 7 sergio.callegari 2014-05-12 16:31:35 UTC
By the way, I have just noticed that the sequence
<nonbreaking space><comma> can get a linebreak between the nonbreaking space and the comma.
Comment 8 sergio.callegari 2015-03-11 18:04:37 UTC Comment hidden (obsolete)
Comment 9 tommy27 2016-04-16 07:23:06 UTC Comment hidden (obsolete)
Comment 10 sergio.callegari 2016-04-18 10:22:02 UTC
The issue is still present in LibO 5.1.2. Makes LibO unsuitable for writing scientific papers or documents.
Comment 11 Regina Henschel 2017-02-05 13:31:13 UTC
*** Bug 105770 has been marked as a duplicate of this bug. ***
Comment 12 QA Administrators 2018-02-06 03:28:00 UTC Comment hidden (obsolete)
Comment 13 Mike Kaganski 2019-02-19 11:21:41 UTC
Still repro with Version: 6.2.1.1 (x64)
Build ID: 757c58e8cb70b2982843211a54750fb3cd79acd5
CPU threads: 12; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded