If you put a non-break hyphen into a document (Ctl+Shift+-, U+2011) it's fine as long as you keep it as an ODF document. However, if you write this out as HTML the character is converted into a non-break *space*. is written instead of ‑ Note that this is not a new bug - it happens in OpenOffice 3.2.1 as well (which is where I first noticed it).
OK. This is the cause: libreoffice-libs-gui-3.4.1.3/svtools/source/svhtml/htmlout.cxx:420 ===== case 0xA0: // is a hard blank //!! the TextConverter has a problem with this character - so change it to // a hard space - that's the same as our 5.2 case 0x2011: // is a hard hyphen pStr = OOO_STRING_SVTOOLS_HTML_S_nbsp; break; ===== No idea what the TextConverter is, but if it has a problem then surely that is the place that needs fixing - not breaking HTML exports instead?
I've built 3.4.1.3 with the following patch and that results in the *correct *html (‑) being output when a document is saved in html format. I've also tested that a cut&paste of the resulting html document (when viewed in LO) into a new odt document (in LO) results in a non-break hyphen. ===== htmlout.cxx.diff ===== --- htmlout.cxx-orig 2011-05-19 11:58:05.000000000 +0100 +++ htmlout.cxx 2011-07-10 23:07:15.612747262 +0100 @@ -418,10 +418,15 @@ switch( c ) { case 0xA0: // is a hard blank + pStr = OOO_STRING_SVTOOLS_HTML_S_nbsp; + break; +// This was labelled as: //!! the TextConverter has a problem with this character - so change it to // a hard space - that's the same as our 5.2 +// but that just breaks html output. Setting the numberic html entity +// seems fine. case 0x2011: // is a hard hyphen - pStr = OOO_STRING_SVTOOLS_HTML_S_nbsp; + pStr = "#8209"; break; case 0xAD: // is a soft hyphen pStr = OOO_STRING_SVTOOLS_HTML_S_shy;
Created attachment 52063 [details] Patch against 3.4.3
[This is an automated message.] This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it started right out as NEW without ever being explicitly confirmed. The bug is changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases. Details on how to test the 3.5.0 beta1 can be found at: http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1 more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
According to http://www.robinlionheart.com/stds/html4/spchars or http://en.wikipedia.org/wiki/Hyphen, shouldn't it be "#8208" instead of "8209" ? Quote of first source : " In addition to the soft hyphen, there is also a hard hyphen (‐ or ‐) which always renders, and a nonbreaking hyphen character (‑ or ‑), for hyphens that do not break words across lines. "
>> shouldn't it be "#8208" instead of "8209" No! As your quote notes, ‑ is the nonbreaking hyphen character, which *is* what is needed here! That is a hyphen which is *always* displayed but *never* breaks. A soft-hypehn is one at an optional break which is only displayed if the break occurs, while a hard-hyphen is one which is always displayed even if there is no break - but a break is allowed. So both of these allow breaks. The whole point of a non-break hyphen is to prevent breaking, while still displaying a hyphen
I can confirm that this bug still exists in: LibreOffice 3.5.0rc1 Build ID: b6c8ba5-8c0b455-0b5e650-d7f0dd3-b100c87
Sorry for the insane delay, fell through the cracks :-(. pushed now, bug was in since very initial commit in 2000 Mailing to the libreoffice@lists.freedesktop.org list is the best route to get a patch looked at FWIW
>> caolanm->gordon: can you confirm your patch is under our preferred LGPLv3+/MPL+ license combination ? Yes - that's fine. I'm happy to transfer all ownership to you (or anyone) to do with as you wish, so that will do.
great, thanks, added you to... https://wiki.documentfoundation.org/Development/Developers if you want to review those details in case I got them wrong. It's insanely wrong the patch lingered so long, apologies.