Bug 41652 - "NO-BREAK SPACE" (U+00A0) interpreted as fixed-width space
Summary: "NO-BREAK SPACE" (U+00A0) interpreted as fixed-width space
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high normal
Assignee: Not Assigned
URL:
Whiteboard: target:24.2.0 inReleaseNotes:24.2
Keywords:
: 49674 (view as bug list)
Depends on:
Blocks: Font-Rendering Formatting-Mark Arabic-and-Farsi Authors Word-Line-Break
  Show dependency treegraph
 
Reported: 2011-10-10 08:45 UTC by dohnp5a1
Modified: 2024-09-10 17:31 UTC (History)
30 users (show)

See Also:
Crash report or crash signature:


Attachments
The preposition “s” may not stand in the end of line in Czech, thus no-break space is used after that (116.90 KB, image/png)
2011-10-10 08:45 UTC, dohnp5a1
Details
non breaking space interpreted as fixed width space (15.26 KB, application/vnd.oasis.opendocument.text)
2016-02-24 17:22 UTC, Stanislav Horacek
Details
no gray NBSP when Formatting Marks is Off, but Field Shadings in On. (145.80 KB, image/png)
2023-05-18 09:38 UTC, Kamil Landa
Details
Text overflow with variable-width NBSP (117.78 KB, application/vnd.oasis.opendocument.text)
2023-05-18 11:50 UTC, Kamil Landa
Details
Joined arabic words with NBSP (20.43 KB, image/png)
2023-05-18 11:53 UTC, Kamil Landa
Details
Variable width NBSP caught in a rare moment of working properly (21.43 KB, image/jpeg)
2023-05-18 12:05 UTC, dolezvo1
Details
NBSP example ODT (11.22 KB, application/vnd.oasis.opendocument.text)
2024-09-10 15:39 UTC, Piotr Osada
Details
NBSP fix-flexible rendering in LO24.2.0.3 (18.98 KB, image/png)
2024-09-10 15:45 UTC, Piotr Osada
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dohnp5a1 2011-10-10 08:45:35 UTC
Created attachment 52184 [details]
The preposition “s” may not stand in the end of line in Czech, thus no-break space is used after that

The character “NO-BREAK SPACE” (U+00A0) is incorrectly interpretated like
“fixed width no-break space”. The fixed width is impedimental there, to keep the
typographical rules and create a nice document with the block justification,
there is a need of a normal, flexible non-breaking space after some characters
(in LaTeX “~” has this function). Now, the space after U+00A0 is narrower than other spaces in the same line, which is ugly.
Comment 1 Björn Michaelsen 2011-12-23 12:35:12 UTC Comment hidden (obsolete)
Comment 2 dohnp5a1 2011-12-23 14:26:08 UTC
Still present in 3.5.0.
Comment 3 Roman Eisele 2012-04-17 03:03:21 UTC
If I remember correctly, this is a very old problem already present in OOo 3.x and maybe even older (therefore I changed the 'Version' picker back to the oldest available version). It is also not limited to Czech typography, but the Czech sample shows the severity of the problem especially well.

Writer interprets U+00A0 like a 'fixed width no-break space', but users with some experience in typesettings or DTP applications (TeX with all variants '~', QuarkXPress and others) expect that U+00A0 should behave completely like an ordinary space character (U+0020), i.e. it should be extended and compressed in justified paragraphs like an ordinary space character, it just should not break at the end of a line ...

I have never complained about this problem because I assumed it is just a clash of expecations: (many) Word processor users are used to the current behaviour, i.e. to the fact that U+00A0 behaves like a “fixed width no-break space”, while users with some experience in typesetting (like me and, obviously, dohnp5a1) would expect the other behaviour, i.e. that U+00A0 should be extended and compressed like a ordinary space. This is why I don’t use U+00A0 in Writer at all, it is just unusable in justified paragraphs for me.

But the Czech sample shows that this IS a real problem, of course, and now I know I'm not the only user who is not satisfied with the state of affairs.

So maybe we should discuss if the traditional behaviour should be changed, probably with a new compatibility setting to keep the traditional behaviour for old documents ...?!
Comment 4 Simo Kaupinmäki 2012-09-14 20:29:10 UTC
*** Bug 49674 has been marked as a duplicate of this bug. ***
Comment 5 stfhell 2012-10-18 14:26:25 UTC
I think this should be classified as an enhancement rather than a bug. The current behaviour is in fact ancient word processing practice, predating Unicode standards. U+00A0 became the successor of the old "hard space" defined for use with ASCII codesets, and changing the treatment of U+00A0 would break countless documents which purposely use hard spaces as _fixed-width_ non-breakable spaces (with abbreviations like "Dr Freud", "i. e." or punctuation like "« Bonjour! »", "5 %" etc.). It would also be not compatible with current MS Word practise.

However, distinguishing between different forms of white space is a typographical need and should be addressed somehow. DTP software like InDesign has all sorts of spaces: em space, en space, nonbreaking space, nonbreaking space fixed width, third/quarter/sixth/hair/thin space (1/3, 1/4, 1/6, 1/24, 1/8 em space), figure space, punctuation space. LibreOffice has "space" and "hard space" (and of course Unicode spaces like U+202F and U+2009, which it handles better than MS Word).

Jan_J (bug 49674 comment 2) proposed to use the Unicode word joiner U+2060 with a normal space to get a non-fixed-width non-breakable space. But U+2060 is a zero width non-breaking space inhibiting line breaks at both sides which is "intended for disambiguation of functions for byte order mark" (Unicode 6.2). That does not sound like a good candidate for such a space (and one would need the triple U+2060 + U+0020 + U+2060, wouldn't one?).

Users definitely also need non-breakable _fixed-width_ spaces, and if LO redefined U+00A0 as of non-fixed-width (in accordance with Unicode) - what character should be used for the classical "hard space"? MS Word displays "box characters" for symbols not defined in the active font, which should be kept in mind. (I know it cannot handle U+2009, but I haven't tested U+202F.)

A practical solution would probably be to let the user decide on a per document-basis how to interpret U+00A0: fixed width or proportional? That is, to add a configuration option under "Writer/Compatibility". But even then one should _still_ be able to use all necessary kinds of spaces at least in ODT; they may need to be converted for DOC/DOX export, however, because of MS Word limitations.
Comment 6 Roman Eisele 2012-10-18 16:15:48 UTC
@ stfhell:

Thank you for your careful description of the situation and of possible options!
You are right, this issue is better regarded as an enhancement request; therefore I change the Importance field accordingly.
Comment 7 Simo Kaupinmäki 2012-10-19 21:26:55 UTC
It is easy to agree with Stfhell's notion that the intervening space in expressions such as "Dr Freud" and "5 %" should be non-breaking, but I can't quite see the reasoning behind it having to be of fixed width too. By similar logic, shouldn't the spaces in "Sigmund Freud" and "five per cent" have fixed width as well? I find it rather inconsistent that a non-breaking space, which in non-justified text looks exactly like an average space, may stand out as narrower than average if the text is justified. Can you point out an authoritative source that actually recommends this? (Note that even in justified text, the difference will only be discernible on some of the lines, and in carefully typeset publications it should ideally not be discernible at all because the variation between lines is minimized by using hyphenation.)

The French spacing applied in connection with certain punctuation is a little different matter, as U+00A0 is mostly considered too wide for this purpose in professional-level typography as far as I know. A more appropriate character should be the narrow no-break space U+202F (though technical support for it may still be lacking in some environments; for a detailed, though not necessarily quite up-to-date discussion, see http://stackoverflow.com/questions/595365/how-to-render-narrow-non-breaking-spaces-in-html-for-windows).

As regards abbreviations such as "i.e." the standard way to write these seems to be without any space, at least as far as English is concerned:

http://www.merriam-webster.com/dictionary/i.e.
http://oxforddictionaries.com/definition/english/i.e.

So, in principle what my point boils down to is this: Is there actually a legitimate need for a fixed-width no-break space that is _only_randomly_ distinguishable from a normal space in justified text? Sure, many people have learned to expect that U+00A0 behaves like that, but from a professional typographer's perspective this expectation may be misguided, and it is clearly contradicted by the Unicode standard. (It may also be worth noting that Firefox nowadays seems compliant with the Unicode in its rendering of U+00A0.)

That said, the approach suggested by Stfhell might indeed offer a practical compromise, catering both for the Unicode-compliant view and the MS Word-compliant view.
Comment 8 Simo Kaupinmäki 2012-10-20 09:04:26 UTC
Oh, the two dictionary links included in my previous comment should incorporate the final period, which apparently has been interpreted as sentence-ending punctuation by the Bugzilla system.
Comment 9 stfhell 2012-10-21 11:49:19 UTC
(In reply to comment #7)
> It is easy to agree with Stfhell's notion that the intervening space in
> expressions such as "Dr Freud" and "5 %" should be non-breaking, but I can't
> quite see the reasoning behind it having to be of fixed width too. By
> similar logic, shouldn't the spaces in "Sigmund Freud" and "five per cent"
> have fixed width as well? I find it rather inconsistent that a non-breaking
> space, which in non-justified text looks exactly like an average space, may
> stand out as narrower than average if the text is justified. Can you point
> out an authoritative source that actually recommends this?

Typesetting conventions are conventions, not ISO standards, and they vary with language and time and personal taste. I can direct you to the orthographic German "Duden" (following DIN 5008 for letter-writing): With office documents and e-mails use a space after abbreviation dots (z. B., u. a. m.), but not in dates (05.07.06); in word processing use a small fixed-width space in both abbreviations and dates. (What merriam-webster.com and oxforddictionaries.com do is compatible with _English_ typesetting practise and with common writer's practise, because it's the easiest way to prohibit a line break.)

Spaces before/after/around symbols like $ % & / « » vary a lot, but in typesetting handbooks you usually find recommendations like 1/6 or 1/8 or 0 em quad. A full and proportional space would be regarded as unprofessional typesetting in Germany. In typesetting systems, users have fixed-width spaces of all sizes (including the normal inter-word size of about 1/4 quad) for all kinds of usages (space between chapter number and title; aligning numbers like "347" and "_47" vertically; insert a space at paragraph end to avoid the last line being fully justified). They are "tools" for laying out text, not necessarily a way to encode text as information - typesetters use such things as double 1/4 quad spaces.

So fixed-width variants of normal space size do have a use (and Unicode defines them: U+2002, U+2004, U+2005 etc.). The important point is not that the fixed-width space should be distinguishable in all cases, but that it should not be extensible with proportional spacing. In good typography such spaces should in most cases be smaller than the regular space (as you say).

And, of course, you are right in that U+00A0 is _not_ defined as fixed-width. And Microsoft knows that:
http://www.microsoft.com/typography/developers/fdsspec/spaces.htm
But designing fonts and designing word processors are different things for Microsoft. Offering Word users a submenu with various types of spaces would be overkill for most users, and Microsoft has decided to offer them the fixed-width normal space as a single "compromise" alternative. Whether from the need to be downward-compatible with pre-Unicode documents, from misinterpretation of the Unicode standards or from conscious design principles. (Word processors are in fact used as modern typewriters, people don't want to fiddle with half a dozen spaces, and many don't even bother with hard spaces.)

In a world where only recent versions of Firefox render U+00A0 correctly, where Adobe epub-reader software cannot render a soft hyphen correctly and the most commonly used word processor renders all spaces apart from U+0020 and U+00A0 as boxes if the font doesn't define them (LibreOffice uses the glyphs from a substitution font), you cannot just follow Unicode standards blindly without regard to compatibility issues.

But of course there is other software than MS Word. InDesign imports Unicode spaces well from DOC files, and LibreOffice shouldn't let itself be limited by a word processor with modest formatting capabilities. (In InDesign, imported U+00A0 are rendered correctly. Thin spaces are fixed-width, as far as I know, in line with common typesetting practise.) But it should be a conscious decision of the user to depart from Word conventions on a per-document basis. The problem is: What space could be used for fixed-width spaces (for which there is also a definite need) if you tick that future LO box "Treat hard space as proportional"?
Comment 10 Simo Kaupinmäki 2012-10-26 20:03:18 UTC
(In reply to comment #9)
> I can direct you to the orthographic German "Duden" (following DIN 5008
> for letter-writing): With office documents and e-mails use a space after
> abbreviation dots (z. B., u. a. m.), but not in dates (05.07.06); in word
> processing use a small fixed-width space in both abbreviations and dates.

Thank you for the reference. I got hold of a copy of the 2009 edition of "Duden: die deutsche Rechtschreibung", which defines hard spaces ("Festabstände") as fixed-width, mostly smaller ("meist kleinere") spaces that prohibit line breaks (the same definition is already included in the 2000 edition and available online: http://www.egb-buende.de/tools/EDV_Fuehrerschein_NRW/03_Grundlagen_Textverarbeitung/textverarbeitung_duden1.pdf). 

So, this definition primarily seems to concern non-breakable _thin_ spaces, though the modifier "meist" leaves room for some interpretation. Furthermore, in the context of specific examples (e.g., of the use of the percent sign) it is repeatedly said that a "smaller space" should be used that is explicitly described as both hard and protected ("geschützter"; the 2000 edition isn't quite as explicit on these points). On the other hand, according to the 2009 edition, the official standard DIN 5008 speaks of a full ("ganzer") space, which apparently needs be neither fixed-width nor non-breakable. My reading of all this is that almost any space will do, but a non-breakable thin space is preferred. And this was basically what I was talking about: normally you'd want either a variable-width space or a _thin_ fixed-width space, not a fixed-width space that sometimes looks like a normal space and sometimes not.

> They are "tools" for laying out text, not necessarily a way to encode text
> as information - typesetters use such things as double 1/4 quad spaces.
> 
> So fixed-width variants of normal space size do have a use (and Unicode
> defines them: U+2002, U+2004, U+2005 etc.). The important point is not that
> the fixed-width space should be distinguishable in all cases, but that it
> should not be extensible with proportional spacing. In good typography such
> spaces should in most cases be smaller than the regular space (as you say).

Now I'm a little confused. Are you talking of the regular no-break space (U+00A0) or the _narrow_ no-break space (U+202F) here?

What I said was basically that in the ideal case there should hardly be any distinguishable difference between U+00A0 and a normal space, even if U+00A0 was treated as a fixed-width space. If a fixed-width space is not distinguishable from a normal space, it does not matter in practice whether it is of fixed width or not. A different matter is that often U+00A0 is just used as the poor man's narrow no-break space, relying on it being treated as a fixed-width space in justified text. I can see the reasoning, but this usage is not in alignment with the best practices of traditional typography as far as I can see.

Granted, Unicode defines a set of fixed-width spaces, the majority of which are, as formulated on the Microsoft page you referred to, characters corresponding to traditional typographic _space_values_ that have indeed been applied in manual typesetting. Historically, for each space between words on a line, an identical space value (typically corresponding to U+2004 or U+2005) would have been applied. For each space on other lines, a slightly different value was applicable when necessary to get all the lines justified. After punctuation, a larger-than-average value would often have been preferred, or in some special cases, a thin space. 

For more details, see paragraphs 239–254 explaining technical terms in the 1st edition of the Chicago Manual of Style (published in 1906): 

http://www.chicagomanualofstyle.org/facsimile/CMSfacsimile_terms.pdf

This is the historical background to the Unicode fixed-width spaces, and one might want to argue that many of these characters are of little practical use in the age of digital typography. Notably, most of the Unicode fixed-width spaces are _breakable_ and have no non-breakable counterparts (breakability was not a concern in manual typesetting, as each line was typeset as a separate unit). This can be seen as a deficiency in the Unicode character repertoire, or alternatively perhaps as an implicit stand that the kind of fine adjustment they were originally intended for should rather rely on different means in modern typesetting systems. Using the Unicode fixed-width spaces for manual justifying in digital typesetting would be awkward and anachronistic.

Be that as it may, there is not much LibreOffice can do to change the overall situation. If a document is first edited in LibreOffice and then opened in another application (possibly after being exported into a specific format), U+00A0 will be rendered either as a variable-width space (in Firefox) or as a fixed-width space (in MS Word). LibreOffice only has control over its own rendering of the character (and how it will be printed in some non-editable formats, such as PDF). Additionally, LibreOffice might want offer an easy short cut to entering U+202F in order to cater for finer typography, but again, there can be no guarantee that it will be rendered correctly in other applications.
Comment 11 stfhell 2012-10-31 12:02:11 UTC
(re Comment #10) I don't think it makes much sense to discuss the merits of various spaces or typography issues, Simo, especially on LO Bugzilla. The characters exist, people can use them, and LO should handle them as well as possible. The details are often just a matter of taste, or the willingness to distinguish among a dozen kinds of space characters...

I think Roman's proposal (Comment #3) to let the user configure in Options/Writer/Compatibility how the classical "hard space" (encoded as U+00A0) should be handled (fixed-width as in Word or proportional as Unicode says) is a very practical solution. The Compatibility menu gives users the choice to set an option just for the current file or use it as a default.

If you decide to go with Unicode standards and configure a proportional U+00A0, you can use the characters that Unicode has defined as fixed-width spaces: U+2000 to U+200A, U+202F, U+205F.

The problem here is interoperability with MS Word, because Word, as said, displays all characters not defined in the font as "box characters". But probably this is becoming less of a problem. I had a look at some fonts: For Windows 7, Microsoft supplies fonts that actually define the various Unicode spaces. So it would be users of older Windows versions or users of the many fonts with a more restricted character set that would see the "box character". You can, as a workaround, format all fixed-width spaces in Verdana, Times New Roman or some other Unicode font to avoid that (somewhat, at least).

You can use fileformat.info to check some font glyph sets:
http://www.fileformat.info/info/unicode/char/202f/fontsupport.htm
(for U+202F), or you can use the character map application of the OS.

I have no idea how comprehensive the fonts that come with MacOS are, or if Word for Mac has the same "box character" issue as Word for Win. Would be good to know...
Comment 12 stfhell 2012-10-31 13:49:43 UTC
(In reply to comment #10)
> Using the Unicode
> fixed-width spaces for manual justifying in digital typesetting would be
> awkward and anachronistic.

I consider a good support for Unicode spaces as something essential for a word processor with advanced layout capabilities. Software like Word or LibreOffice and even InDesign is in many respects "anachronistic" in your sense of the word (typewriter-like or lead-typesetting-like), there is no other way to define the spacing you want but in the form of glyphs. With XSL transformations of XML documents (or TeX) you can have a stylesheet (instead of the document) define the spaces in a template, thus achieving a uniform handling of for example thin spaces around « Bonjour! » - but even then you need to define the spaces as Unicode characters in the template. You just needn't encode them in your document. With word processors (and DTP software), you have to set all the spaces yourself in the document text. It is somewhat anachronistic and very error-prone typesetting - but a straightforward and simple concept for users.
Comment 13 Simo Kaupinmäki 2012-11-02 19:03:15 UTC
(In reply to comment #12)
We all more or less seem to agree that the rendering of U+00A0 as a fixed-width space is basically a bug. Therefore it is a somewhat perverted situation that the bug cannot simply be fixed without paying attention to how the incorrect behaviour can also be preserved. Is it actually worth the trouble? I am not saying that it is a complete waste of time and effort, but this is a question that deserves to be asked too.

It was not me who brought up that there are a variety of fixed-width spaces in Unicode. Nevertheless, as we are discussing whether or not U+00A0 should continue to be rendered as a fixed-width space, at least optionally, we should try to understand the background to and reasoning behind these standardized fixed-width spaces (and why U+00A0 is not one of them). Sure, some of them are still relevant today, but unquestionably some are redundant (U+2000 and U+2001 are canonically equivalent to U+2002 and U+2003 respectively). And then there are some the relevance of which can be questioned as far as modern typesetting practices are concerned.

However, nobody has suggested that the redundant or possibly archaic Unicode characters need not be handled correctly. That is not the issue here. There are many redundant, archaic and even deprecated characters in Unicode, for which the main motivation is historical. People who want to use these characters for whatever reason in their documents should certainly have the option to do so, even if it may not always be the most elegant or technically reliable choice in digital typesetting.

> > Using the Unicode
> > fixed-width spaces for manual justifying in digital typesetting would be
> > awkward and anachronistic.
 
> there is no
> other way to define the spacing you want but in the form of glyphs.

I'm afraid you may have missed my point, so I'll try to clarify. The example was about how text used to be _justified_ manually. Historically, to achieve this effect you would have applied specific space values between words on each line. This was in fact one of the main uses for the various space values in manual typesetting. Today, however, if you want spacing on each line to be even, you simply specify that the text should be justified and let your application software automatically adjust the width of spaces accordingly. There is no point in using various fixed-width spaces for this purpose anymore.

Historically, the first line of a paragraph would have been indented about one em space. Today, rather than inserting U+2003 (or U+2001), you can specify a fixed indentation value that will automatically be applied at the beginning of each paragraph. There is no need for a specific glyph there anymore.

Historically, certain punctuation, such as a sentence-ending period, would have been followed by an em space or a couple of three-per-em spaces. Today this is often regarded old-fashioned, but people who still want to follow the tradition simply tend to type two (or even three?) regular spaces after the period. Sure, a purist could insert a U+2003 or a couple of U+2004s instead, but I fail to see how this would make any significant improvement from a typographical point of view.

Yes, you can continue to use all the standard fixed-width spaces if you want to, but this is what you choose to do and it does not make the _software_ anachronistic. When using a word processor, my father, who is in his late seventies, still tends to break lines manually by tapping the return key at the end of each line, and he may also hyphenate the last word on a line by inserting a regular hyphen (U+002D) before the line break. This is because his paradigm of typing is of a different era. He does not take full advantage of the modern technology (in fact he still prefers a mechanical typewriter occasionally) – and that's fine, since he is retired and mostly writes for his own pleasure every now and then. But for the rest of us it is good to know that today there are more elegant methods of typing a piece of text.

As regards French spacing in « Bonjour ! », inserting a non-breakable thin space before or after the punctuation marks may be a practical solution at the present time, but there are in fact alternative methods for this too. Smart font technology, as exemplified by "Linux Libertine G" and "Linux Biolinum G" fonts (already bundled with the recent versions of LibreOffice, even on Windows), allows automatic application of French spacing where deemed appropriate. There is no need to insert a specific space character, as LibreOffice is able to recognize the guillemets, exclamation marks, question marks etc. and take care of proper spacing by making use of additional instructions incorporated in the font itself. Unfortunately the technology is far from being universally supported (and there are several alternative ways to incorporate similar features), but with the right combination of font technology and application software it is quite functional.

For more information on this technology, see:
http://numbertext.org/linux/
http://scripts.sil.org/cms/scripts/page.php?site_id=projects&item_id=graphite_home&_sc=1

This kind of smart font technology could also offer an elegant way to make available the optional fixed-width U+00A0, were this feature considered important enough to be incorporated in a font. The feature would then be available in any application software able to support the font technology, and being an optional feature of the font it could be applied to any portion of a text, rather than to the document as a whole. On the downside, it would only be available with some specific fonts. This might also be criticized as an excessively sophisticated approach to implement a feature that is basically non-standard.
Comment 14 Roman Eisele 2012-11-04 19:00:45 UTC
(In reply to comment #13)
> We all more or less seem to agree that the rendering of U+00A0 as a
> fixed-width space is basically a bug.
Yes. But a special kind of bug: a bug which has been sanctioned by the fact that it has been in Microsoft’s applications since ages, and therefore is considered as some kind of “industry standard” by many people :-(

> Therefore it is a somewhat perverted
> situation that the bug cannot simply be fixed without paying attention to
> how the incorrect behaviour can also be preserved. Is it actually worth the
> trouble?
IMHO we can not just fix this bug by rendering U+00A0 as a proportional space, because many people, probably: most (!) people will consider the new, proportional rendering as a bug and cry: “You don’t render my .doc files correctly anymore, fix this!” ...

Therefore the simplest solution which improves typography in LibreOffice without breaking the (wrong) assumptions of many people is (as already suggested in comment #3) to let the user configure in Options/Writer/Compatibility how the classical “hard space” (U+00A0) should be rendered: fixed-width as in Word or proportional (as Unicode and many (most?!) textbooks about typography suggest). This option should work on a per-document base, of course. So all existing documents will look like before when we open it, but can be changed to the new, better rendering by just switching that option; and the same option will also control the behaviour of documents created anew.

Adding that option should not be too difficult; I remember that similar additional compatibility options have been added in the past with relatively few lines of code ...


Sorry for just repeating my initial suggestion, but IMHO all the other, more advanced stuff -- e.g. bigger spaces after a sentence-ending period -- are different items, for which we should file special enhancement requests, if necessary. The present bug report is already far too long and complicated --
no developer who is just looking for some work to be done will understand easily the many things we have been already discussing in this single bug report ;-)
Comment 15 Roman Eisele 2012-11-28 19:57:56 UTC
(In reply to comment #5)
> Jan_J (bug 49674 comment 2) proposed to use the Unicode word joiner U+2060
> with a normal space to get a non-fixed-width non-breakable space. But U+2060
> is a zero width non-breaking space inhibiting line breaks at both sides
> which is "intended for disambiguation of functions for byte order mark"
> (Unicode 6.2). That does not sound like a good candidate for such a space
> (and one would need the triple U+2060 + U+0020 + U+2060, wouldn't one?).

For Jan_J’s request (and related problems of the line-breaking algorithm), there is now bug 57652 - “Wrong treatment of Word Joiner (U+2060) in line breaking algorithm”.

However, I think that one solution does not necessarily invalidate the other. I.e., when the bug 57652 would get fixed, and WJ + SP (+ WJ) would act like an elastic NBSP, we could still discuss if the behaviour of U+00A0 should be changed, too (or better: if an option to choose the behaviour of U+00A0, elastic or fixed, should be added). So the present bug report is still a valid enhancement request and essentially independend from bug 57652.
Comment 16 QA Administrators 2015-07-18 17:43:13 UTC Comment hidden (obsolete)
Comment 17 dohnp5a1 2015-07-18 21:39:43 UTC
The bug is still present in LibreOffice 4.4.4.3, on Ubuntu 14.04 LTS, its behavior did not undergo any changes.
Comment 18 Chris Sherlock 2016-02-23 14:05:24 UTC
Question: is this occurring in .ODT *and* .DOCX files?

And chance of uploading a test document?
Comment 19 Stanislav Horacek 2016-02-24 17:22:21 UTC
Created attachment 122957 [details]
non breaking space interpreted as fixed width space

Yes, this happens for all formats: ODT, DOCX and DOC.

A sample document (with text of the attached PNG) attached.
Comment 20 jkl 2016-06-18 21:03:01 UTC
(In reply to dohnp5a1 from comment #0)
> Created attachment 52184 [details]
> The preposition “s” may not stand in the end of line in Czech, thus no-break
> space is used after that

It's similar in polish editing style best practices, the correct behaviour would really set apart LO positively.
Comment 21 Zenaan Harkness 2016-08-29 16:25:18 UTC
Anecdote from a 100-year old legal book in Australia (our "Annotated Constitution"):
 - nbsp between name components such as "Mr. Smith" contain a narrow non-breaking space
 - whereas nbsp between place names such as "Port Agusta" are normal or proportional-width spaces (most of the book is justified)

I.e. to properly duplicate this book requires both types of spaces.
Comment 22 rysson 2016-08-30 11:45:21 UTC
Hi.
Web browsers use U+00A0 as non-breaking not-fixed-width space too.
Also MS Office changed the behavior:

With the introduction of Word 2013, MS changed the behaviour of the ASCII 160 non-breaking space. It now conforms to the CSS space rules. This allows the space to expand/contract with justification so that all spaces on a line have the same width; the ASCII 160 behaviour could look odd with its fixed-width non-breaking spaces in such cases. For fixed-width non-breaking spaces you can use one of the other non-breaking space characters (eg Narrow No-Break Space: 202F,Alt-x).

Source:
http://answers.microsoft.com/en-us/office/forum/office_2013_release-word/distance-between-two-words-with-nonbreaking-space/3f6b3d0d-9ab7-422f-8381-84c9ef06c7cb?auth=1
Comment 23 vhaisman@gmail.com 2016-10-11 17:12:45 UTC
This is biting me as well. It makes Libreoffice Writer basically unusable for any serious Czech language text.
Comment 24 brechacik 2016-10-27 16:51:05 UTC
This concerns Slovak texts as well. It prevents me using LibreOffice for book publishing.
Comment 25 [REDACTED] 2017-08-28 11:32:48 UTC
(In reply to rysson from comment #22)
> Hi.
> Web browsers use U+00A0 as non-breaking not-fixed-width space too.
> Also MS Office changed the behavior:
> 
> With the introduction of Word 2013, MS changed the behaviour of the ASCII
> 160 non-breaking space. It now conforms to the CSS space rules. This allows
> the space to expand/contract with justification so that all spaces on a line
> have the same width; the ASCII 160 behaviour could look odd with its
> fixed-width non-breaking spaces in such cases. For fixed-width non-breaking
> spaces you can use one of the other non-breaking space characters (eg Narrow
> No-Break Space: 202F,Alt-x).
> 
> Source:
> http://answers.microsoft.com/en-us/office/forum/office_2013_release-word/
> distance-between-two-words-with-nonbreaking-space/3f6b3d0d-9ab7-422f-8381-
> 84c9ef06c7cb?auth=1

Well, they seem to change it again, because in Word 2016 the non‐breaking space have a fixed‐width property again. It stopped commiting to modern standards for whatever reasson (probably the whining of long‐term users). Source:

Source: https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2016/nonbreakable-space-justification-in-word-2016/4fa1ad30-004c-454f-9775-a3beaa91c88b
Comment 26 rysson 2017-08-28 12:01:35 UTC
> It stopped commiting to modern standards for whatever reasson (probably the whining of long‐term users).

Yeah. But it's not good reason to stop fixing it in LibreOffice.
At least the option can be added (fixed / non-fixed)..
Comment 27 vhaisman@gmail.com 2017-08-28 15:11:33 UTC
Microsoft does what Microsoft does. I can tell you as native Czech language speaker and writer that the fixed size of the  /U+0160 characters is contrary to Czech typography. Currently, both Word in its latest version and LibreOffice are unusable for any serious documents do to this issue. *Please* make this at least optional per document or per paragraph or such.
Comment 28 brechacik 2017-08-28 17:19:19 UTC
I managed to work around it! Now I can do book-publishing in LibreOffice in the Slovak language. I use the U+2060 (WORD JOINER [WJ]) character for a non-breakable relative-width space. Does it work for you guys as well?
Comment 29 Jan 2017-12-23 14:20:58 UTC
Dear friends,

If I have understood the discussion well, the compatibility with standards and other software is a problem.

But if we limit the solution to two cases:
- printing
- exporting to PDF
(which is entirely enough for me)

then the compatibility problem disappears, as LO can develop a proprietary solution and encode this character in some way for the export only.
Comment 30 Shriramana Sharma 2017-12-31 14:59:10 UTC
The Unicode standard document http://unicode.org/reports/tr14/ clearly states that:

<quote>
When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width.</quote>

Whether LibreOffice or Word, both should comply to the above standard and expand both U+0020 and U+00A0 equally. LibreOffice should not blindly mimic what (changing) behaviour Word exhibits on this score.

Please fix this! This is a real embarrassment vis a vis good typography practices. 

Whoever wants fixed width spaces please use one of the remaining space characters!
Comment 31 rysson 2018-01-01 01:07:36 UTC
Yes, yes, yes!
This bug was reported in 2010. Please, I think it's time to fix it. I know I still can use LaTeX or pure HTML to write documents. But using LibreOffice for text documents will be very nice IMO.

Now I can break typografy (in Polish, and many another languages) or see awful text like:

This     is     justified     text     with     a0 space     –     terrible.

Should look:

This    is    justified    text    with    a0    space    –    terrible?


BTW. Happy New Year!
Comment 32 Mike Kaganski 2019-11-04 07:42:27 UTC
Unicode has that "non-breaking" property only set to some of its different spaces. But in reality, different typographic rules of countries/bodies/times, as noted, may require full repertoire of spaces of different width and properties (fixed/widening/shortening) to be breaking *and* non-breaking variants. ODF could introduce a special internal "non-breaking" character property applicable to any space character, which would override normal Unicode algorithm; that would allow for adding shortcuts for such combos. Without that, any "fix" like the one asked here by many would only fix things for vocal minority, and break things for most users who naturally don't participate in discussion here, because - well, it just works for them, and they don't look for this bug ;-)

Note that what I propose would require an ODF extension.
Comment 33 dohnp5a1 2019-11-11 22:38:39 UTC
At first set the default behaviour of LO strictly according the Unicode definitions – it is about time after 8 years (!). As a supplement later there could be some extension with more features added.
Comment 34 Mike Kaganski 2019-11-12 04:37:10 UTC
(In reply to dohnp5a1 from comment #33)
> At first introduce incompatibility by breaking unknown number of millions of existing documents of our users, to allow *some other* users to do what they need; and only then start thinking about doing it properly

No, it doesn't work that way. While Unicode is an important standard, it's only of secondary importance to an office suite. Its primary goal is *not* creating a reference comformant implementation of the standard; rather, it should use the standard to the extent it needs to serve its users most. And if legacy requires that some statements of standard be violated to keep existing documents intact, that should be that way, until a better design is invented and implemented, which would make possible to please both sides.
Comment 35 Shriramana Sharma 2019-12-22 05:51:48 UTC
I would be happy with a per-document option for how to treat NBSP-s, but the default behaviour for new documents should be configured where?
Comment 36 gawkla 2021-01-08 10:17:00 UTC Comment hidden (me-too)
Comment 37 Maciej Kotliński 2021-03-11 20:42:52 UTC
Everybody understand that the compatibility should be preserved. Unfortunately most of texts in Polish, Czech and probably some other languages looks ugly... very ugly. I know that other word processors format these texts in similar ugly way. Could LiberOffice be better then others?

Nobody        like to         read        such       a justified      text.

It would be niece to have an option allowing for setting variable width of nonbreakable space in paragraph format settings. The user could decide.
Comment 38 Ultimate Apparels 2021-08-04 07:23:45 UTC Comment hidden (spam)
Comment 39 dolezvo1 2023-03-14 11:11:17 UTC
This is way overdue for fixing. Currently most reasonable solution seems to be: add a flag to specify new NBSP behaviour is to be used, and for documents that have it interpret NBSP as a variable-width NBSP.

If someone still needs fixed length NBSP for some reason, it can be achieved using fixed width space (EnSpace, EmSpace, etc.) surrounded by two zero width unbreakable characters, such as Word Joiner (U+2060) (can be inserted by clicking Insert > Formatting mark > Zero width space, unbreakable)

Sounds good to everyone?
Comment 40 Commit Notification 2023-04-04 06:23:45 UTC
Vojtěch Doležal committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/28675af84ae8e2342bd78be3696dc09de6ce5cc5

tdf#41652: Variable width NBSP

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 41 Buovjaga 2023-05-08 17:40:37 UTC
Will there be further commits or can this be closed as fixed?
Comment 42 dolezvo1 2023-05-08 17:50:52 UTC
Hey, yeah, to be perfectly honest I don't think it's working all that well atm. Taking another shot at it will be the first thing I do when I have bit more time (this weekend, perhaps?), but I'm extremely busy right now. I hope I could manage to work it out before 7.6 gets released, but I'm not completely sure when that is supposed to be.
Comment 43 Kamil Landa 2023-05-18 09:38:24 UTC
Created attachment 187368 [details]
no gray NBSP when Formatting Marks is Off, but Field Shadings in On.

Understand I well this commit implement to show the Degree Character, but it doesn't implement the non-fixed width of NBSP? Because non-fixed width isn't functional.

Degree Character is functional (View/Formatting Marks), but there was gray background of NBSP when the View/Field Shadings was active independently of View/Formatting Marks. And after this commit there isn't gray background for NBSP with active View/Field Shadings and no active View/Formatting Marks, that is unpleasant.

[For sure also in author's native language, Czech: Chápu to dobře že tato změna ukazuje jen znak Stupně pro NBSP když je zaplé menu Zobrazit/Řídící znaky, ale nedělá tu proměnnou šířku nezlomitelné mezery? Protože ta proměnná šířka mi nefunguje. Znak Stupně je funkční, ale zmizí šedé podbarvení když se vypne Zobrazit/Řídící znaky, i když je zaplé Zobrazit/Stínování polí. V předchozích verzích zapnuté Stínování polí obarvilo šedě NBSP vždy, nyní je NSBP šedé jen když je k tomu zapnuto Zobrazit/Řídící znaky, což je nepříjemné]
Comment 44 Kamil Landa 2023-05-18 09:39:04 UTC
Tested in: 
Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 22950a9b008e1bb22fa9e54b5d45715e25fee764
CPU threads: 8; OS: Windows 10.0 Build 17763; UI render: Skia/Raster; VCL: win
Locale: cs-CZ (cs_CZ); UI: en-US
Calc: CL threaded
Comment 45 dolezvo1 2023-05-18 09:56:35 UTC
Not quite: Currently the variable size NBSP can be enabled through document compatibility options (Options > Compatibility), but I am plan to change that very soon - I plan to move it to the character style options.

Ad NBSP field shade, I believe it makes sense to not display it when the non-priniting characters are disabled, to make the page appear as close to the final product as possible. Not sure if there was a more thorough debate on this or what are the arguments for or against showing it when non-printing characters are disabled.
Comment 46 dolezvo1 2023-05-18 10:21:26 UTC
I imagined that if the shades weren't shown when non-printing characters are disabled, it would be possible to (when enabled) automatically replace space after "a", "s", etc. (bug 46770) with a NBSP without distracting the user.
Comment 47 Kamil Landa 2023-05-18 11:50:41 UTC
Created attachment 187370 [details]
Text overflow with variable-width NBSP

I activated: Options/ Libreoffice Writer/ Compatibility/ Render non-breaking spaces (NBSP) 
Degree character is correctly changed to Tilde, but text is overflowed outside of page - in attachment :-(. 


Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 22950a9b008e1bb22fa9e54b5d45715e25fee764
CPU threads: 8; OS: Windows 10.0 Build 17763; UI render: Skia/Raster; VCL: win
Locale: cs-CZ (cs_CZ); UI: en-US
Calc: CL threaded
Comment 48 Kamil Landa 2023-05-18 11:53:16 UTC
Created attachment 187371 [details]
Joined arabic words with NBSP

NBSP is use also in other scripts than only in Latin script, for example I work also with Arabic script. There is joining of words to one longer word for example for the words with the definite article 'al-. And I use NBSP to join that words that are articulated as one word, and I write the transcription as one word. See only gray background is comfortable, but activate also Formatting marks isn't pleasure. 

But of course somebody can say it is more pleasant not to see the gray background. 

I think the best solution is to do it adjustable and add the option for non-gray background to the Options, than disable it "mercilessly".
Comment 49 dolezvo1 2023-05-18 11:57:20 UTC
I know about this issue, it is just a display bug (changing compatibility options doesn't update the elements). I believe that should go away if you exit and reopen the file, or maybe just Tools > Update > Update All would work. I think it displays correctly for me out of the box.

I agree the shade was a mistake, I didn't know there was an option for disabling it. I will restore that in the next commit.
Comment 50 dolezvo1 2023-05-18 12:03:18 UTC
Actually, no, the file you included didn't have the option enabled (The NBSPs should show as Tildes), but it does appear spaced correctly for me when I change the option and do the Update All command.
Comment 51 dolezvo1 2023-05-18 12:05:05 UTC
Created attachment 187372 [details]
Variable width NBSP caught in a rare moment of working properly
Comment 52 dolezvo1 2023-05-18 12:13:18 UTC
Ah, my bad, I didn't read the content of the file, you knew the option wasn't enabled. Regardless, I know about it and it will definitely be fixed before 7.6 fully releases. Hope I managed to clear this up for now.
Comment 53 João Paulo 2023-06-07 14:35:04 UTC
(In reply to dolezvo1 from comment #45)
> Not quite: Currently the variable size NBSP can be enabled through document
> compatibility options (Options > Compatibility), but I am plan to change
> that very soon - I plan to move it to the character style options.

To keep the documents' internals clean, may I suggest that:

* In ODF 1.3 extended and older, if there is no tag to use NBSP as variable or fixed width, the legacy behavior is used;
* but in ODF 1.4 and newer (extended or not), if there is no tag to use NBSP as variable or fixed width, the UNICODE intended behavior is used?

I ask that so the document's tags can be kept clean.  It's easier to compare documents differences with less "unneeded" tags (tags that if they are absent they still produce the intended output).
Comment 54 dolezvo1 2023-06-07 14:38:22 UTC
If I may ask, what is the UNICODE intended behavior? I was under the impression there wasn't one.
Comment 55 sdc.blanco 2023-09-15 09:25:25 UTC Comment hidden (obsolete)
Comment 56 dolezvo1 2023-09-15 09:33:04 UTC Comment hidden (obsolete)
Comment 57 sdc.blanco 2023-09-16 13:53:42 UTC Comment hidden (obsolete)
Comment 58 dolezvo1 2023-09-16 14:01:39 UTC Comment hidden (obsolete)
Comment 59 sdc.blanco 2023-09-16 22:34:27 UTC
(In reply to dolezvo1 from comment #54)
> If I may ask, what is the UNICODE intended behavior? I was under the
> impression there wasn't one.
iiuc, from following quotation, intended behavior is that U+00A0 should compress/expand.

"When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. When expanding or compressing intercharacter space, the presence of U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER is always ignored." [1]

[1] From "Introduction" of Revision 51 (2023-08-15) 
Unicode® Standard Annex #14
UNICODE LINE BREAKING ALGORITHM
https://www.unicode.org/reports/tr14/tr14-51.html
Comment 60 sdc.blanco 2023-09-16 22:38:08 UTC
(In reply to dolezvo1 from comment #45)
> Currently ... variable size NBSP can be enabled through document
> compatibility options (Options > Compatibility)
Perhaps variable-size NBSP should be the default behavior, if the idea is to follow the UNICODE standard?

> but I plan ... to move it to the character style options.
Is that still the plan? (if so, then I will not make any changes in the current label in the Options - Compatibility dialog, and not add anything to the help page for that dialog). (fwiw, I agree with that plan.)
Comment 61 dolezvo1 2023-09-17 05:43:04 UTC
> iiuc, from following quotation, intended behavior is that U+00A0 should compress/expand.

Interesting, nice find.

> Is that still the plan?

In the long term, yes, but my previous attempts at it weren't successful and I don't really have the time atm.
Comment 62 sdc.blanco 2023-09-17 08:59:21 UTC
(In reply to dolezvo1 from comment #61)
> ... my previous attempts at it weren't successful and
> I don't really have the time atm.
1.  Maybe you should revert the option added recently to sw/uiconfig/swriter/ui/optcompatpage.ui, given that it does not work.

2. Maybe there is no need/reason to have an option in the UI at all. The OP correctly notes that U+00A0 should be variable-space, and there does not seem to be any expectation (in Unicode) that it should ever be fixed space.

3. There are fixed-width space characters in Unicode, but the following quotation seems to indicate that they are not (likely) to used in computer-based justification.

"The fixed-width space characters (U+2000..U+200A) are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters" (p. 267) [1]

[1] "Space Characters" in The Unicode® Standard Version 15.0 – Core Specification
https://www.unicode.org/versions/Unicode15.0.0/ch06.pdf
Comment 63 bintoro 2023-11-08 21:10:35 UTC
FWIW, Pages on macOS treats both U+00A0 NO-BREAK SPACE and U+202F NARROW NO-BREAK SPACE as proportional.
Comment 64 phv 2023-11-18 20:02:51 UTC
Last commit creates a bug (#157768) in LibreOffice stable and release candidate versions.

There's no workaround, and considering that the contributor who made the change hasn't worked on it for two months, can the patch be reverted?
Comment 65 Eyal Rozenberg 2023-11-24 22:34:23 UTC
So, what's the deal with this issue right now?
Comment 66 João Paulo 2023-11-28 17:40:18 UTC
(In reply to sdc.blanco from comment #62)
> 2. Maybe there is no need/reason to have an option in the UI at all. The OP
> correctly notes that U+00A0 should be variable-space, and there does not
> seem to be any expectation (in Unicode) that it should ever be fixed space.

I agree there is no need to have an option in the UI, as the compression/expansion of spaces while justifying paragraphs don't lead to line breaks, so there is no big text reflow (one that may change in what page -- or line -- the text will appear).
Comment 67 Szasz-Fabian Jozsef 2024-01-21 16:41:26 UTC Comment hidden (no-value)
Comment 68 dolezvo1 2024-01-21 17:25:16 UTC
It seems to me it's not that simple. As far as I know nobody from the core team confirmed whether the NBSP should indeed be flexible by default yet. Even if that was the case, I would assume they wouldn't accept a change that wouldn't allow for setting the NBSP to be fixed width, in case someone needs it that way.

In all honesty, I spent weeks trying to implement this and it was really discouraging experience. The documentation seems to be just signatures for the most part, and doesn't explain properly how the pieces fit together. The resulting commit was a complete failure, and I'm hesitant to dive into it again :/
Comment 69 Piotr Osada 2024-09-10 15:39:01 UTC
Created attachment 196373 [details]
NBSP example ODT

Version: 24.2.0.3 (X86_64) / LibreOffice Community
Build ID: da48488a73ddd66ea24cf16bbc4f7b9c08e9bea1
CPU threads: 8; OS: Windows 10.0 Build 22631; UI render: Skia/Vulkan; VCL: win
Locale: pl-PL (pl_PL); UI: en-US
Calc: CL threaded
Comment 70 Piotr Osada 2024-09-10 15:45:59 UTC
Created attachment 196374 [details]
NBSP fix-flexible rendering in LO24.2.0.3

Version: 24.2.0.3 (X86_64) / LibreOffice Community
Build ID: da48488a73ddd66ea24cf16bbc4f7b9c08e9bea1
CPU threads: 8; OS: Windows 10.0 Build 22631; UI render: Skia/Vulkan; VCL: win
Locale: pl-PL (pl_PL); UI: en-US
Calc: CL threaded

NBSP rendering in LO24.2.0.3 fix and flexible-width