Bug 133607 - FILEOPEN: Semi-colons in front of words cause incorrect line break (ICU 60.1 change)
Summary: FILEOPEN: Semi-colons in front of words cause incorrect line break (ICU 60.1 ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Character ICU
  Show dependency treegraph
 
Reported: 2020-06-02 15:40 UTC by Xisco Faulí
Modified: 2022-05-15 00:27 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Comparison MSO 2010 and LibreOffice 7.0 master (61.17 KB, image/png)
2020-06-02 15:40 UTC, Xisco Faulí
Details
DOCX file (12.17 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-06-02 15:41 UTC, Xisco Faulí
Details
DOC file (22.50 KB, application/msword)
2020-06-02 15:41 UTC, Xisco Faulí
Details
Untitled 1234b.odt: copy/paste of the text into new ODT to (assumedly) avoid compat flags. (9.81 KB, application/vnd.oasis.opendocument.text)
2020-06-02 18:52 UTC, Justin L
Details
semicolonedNonBreakingWhitespace_133607.odt: cleanroom demonstration (13.98 KB, application/vnd.oasis.opendocument.text)
2020-07-23 08:59 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xisco Faulí 2020-06-02 15:40:46 UTC
Created attachment 161529 [details]
Comparison MSO 2010 and LibreOffice 7.0 master

Steps to reproduce:
1. Open attached document ( either the DOC or the DOCX document )

-> First line breaks in the middle. it should reach the end of the paragraph. See comparison image.

Reproduced in

Version: 7.0.0.0.alpha1+
Build ID: 82894d85147840f1f587e9530b12f0058f2ef2c3
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded


[Bug found by office-interoperability-tools]
Comment 1 Xisco Faulí 2020-06-02 15:41:06 UTC
Created attachment 161530 [details]
DOCX file
Comment 2 Xisco Faulí 2020-06-02 15:41:29 UTC
Created attachment 161531 [details]
DOC file
Comment 3 Xisco Faulí 2020-06-02 15:43:49 UTC
I've bisected it with bibisect-linux64-6.0 and it points to

author	Eike Rathke <erack@redhat.com>	2017-11-17 11:03:45 +0100
committer	Eike Rathke <erack@redhat.com>	2017-11-20 19:28:10 +0100
commit 9206a08ada00e8762c4a634f242bd566028964bb (patch)
tree eaa317ce6717d44f75c077a6db147b0ebd4994b7
parent a8687041c46b3fe93a76faa0a4a65e7069ef5e9d (diff)
Upgrade to ICU 60.1

so it might be Writer interprets a unicode as a line break?

@Justin, I thought you might be interested in this issue...
Comment 4 Justin L 2020-06-02 16:24:00 UTC
It is not being read in as a line break. (There is no linebreak character indicated with reveal formatting.) Add more spaces, and it will jump back up to the top line.
Comment 5 Justin L 2020-06-02 18:52:14 UTC
Created attachment 161544 [details]
Untitled 1234b.odt: copy/paste of the text into new ODT to (assumedly) avoid compat flags.

I don't think this is related to MS formats.
Comment 6 Dieter 2020-06-05 06:45:17 UTC
I confirm ith with

Version: 7.0.0.0.beta1 (x64)
Build ID: 94f789cbb33335b4a511c319542c7bdc31ff3b3c
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

and Word 2016
Comment 7 Justin L 2020-07-23 08:59:36 UTC
Created attachment 163441 [details]
semicolonedNonBreakingWhitespace_133607.odt: cleanroom demonstration

This seems somehow to be related specifically to the semi-colons (discovered through trial and error). Apparently they have a special meaning when they follow whitespace.

Reproducable steps.
1.) type any sentence in Writer just one word longer than one line, so that it wraps to the next line..
2.) starting from the last word, add a semi-colon in front of it. Notice that the previous word is now added in front.
3.) repeat.

If you DELETE a semi-colon, the text will not re-flow backwards, but if you save/re-open, then the text will re-flow backwards.

This is probably intentional behaviour.  I'd guess that if it is not intentional, then it is an ICU bug and NOTOURBUG. @Eike might be able to provide more knowledgeable insight.
Comment 8 Justin L 2020-11-18 11:40:53 UTC
Tested after yesterday's

author	Eike Rathke  on	2020-11-17 16:33:33 +0100
commit 8335c8c20765d4f167d9b48e6a2757864a3bc7fd 
Update to ICU 68.1

and still the same thing.  A space followed by a semi-colon is treated as a keep-with-next-work flag.
Comment 9 Justin L 2021-11-22 08:05:50 UTC
repro 7.3+ with new ICU 70.1.