Bug 105444 - Extra blank lines inserted at end of Comments in DOCX on save when text added in document before comment (steps in Comment 8)
Summary: Extra blank lines inserted at end of Comments in DOCX on save when text added...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: All Windows (All)
: low minor
Assignee: László Németh
URL:
Whiteboard: target:6.2.0
Keywords: filter:docx, preBibisect, regression
Depends on:
Blocks: DOCX-Comments
  Show dependency treegraph
 
Reported: 2017-01-20 00:17 UTC by pcunger
Modified: 2018-09-22 16:25 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file (4.87 KB, application/wps-office.docx)
2017-01-21 19:33 UTC, Telesto
Details
Example file made on Ubuntu with 5.4 (5.93 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-02-12 07:49 UTC, Gabor Kelemen
Details
Example file made on Windows with 5.4 (5.98 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-02-12 07:50 UTC, Gabor Kelemen
Details
Diff of the contents.xml files - no difference in saved new lines (330.23 KB, image/png)
2018-02-12 07:51 UTC, Gabor Kelemen
Details
The two example files opened on Linux (412.65 KB, image/png)
2018-02-12 07:52 UTC, Gabor Kelemen
Details
The two example files opened on Windows - New lines added to the comments (173.55 KB, image/png)
2018-02-12 07:53 UTC, Gabor Kelemen
Details
Word reads the files correctly too (80.75 KB, image/png)
2018-02-12 07:57 UTC, Gabor Kelemen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description pcunger 2017-01-20 00:17:18 UTC
Writer adds an extra (and eventually, multiple extra) lines at the end of Comments. I know this is a problem when saving in Microsoft Word format (Save As -> Microsoft Word 2007-2013 XML (.docx)).
To reproduce:
1. Insert a Comment (Ctrl + Alt + C)
2. Type text
3. Save as MS Word XML
4. Close file
5. Open file and observe a new line after the text typed in the Comment box

Sometimes there are three or four extra lines at the end (from multiple Save -> Close -> Open cycles?). This becomes a problem when there are multiple Comments on one page and they get scrunched to accommodate them in the available space. If extra lines were not added, some scrunching would not be necessary.
Comment 1 Kevin Suo 2017-01-20 02:09:52 UTC
Do not reproduce under linux
Ubuntu 16.04 LTS X64

Version: 5.2.4.2
Build ID: 3d5603e1122f0f102b62521720ab13a38a4e0eb0
CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; 
Locale: zh-CN (zh_CN.UTF-8); Calc: group
Comment 2 Timur 2017-01-20 19:07:07 UTC Comment hidden (obsolete)
Comment 3 pcunger 2017-01-20 21:07:39 UTC
(In reply to Timur from comment #2)
> Which version did you test with and on which OS?
> There was Bug 85523 but it was fixed.

See above. Version 5.2.4.2 (release) for Windows (I'm using 7).
Comment 4 Telesto 2017-01-20 22:15:23 UTC
Repro with:
Version: 5.4.0.0.alpha0+
Build ID: 99eed82939999d9a9689788a4134dd05d5c20c5a
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-01-14_23:37:40
Locale: nl-NL (nl_NL); Calc: CL

A new line gets added with every save to docx
Comment 5 Telesto 2017-01-20 22:28:48 UTC
Also found in:
Versie: 4.2.0.4 
Build ID: 05dceb5d363845f2cf968344d7adab8dcfb2ba71

but not in
Versie: 4.1.0.4 
Build ID: 89ea49ddacd9aa532507cbf852f2bb22b1ace28

Steps to reproduce
1. Insert a Comment (Ctrl + Alt + C)
2. Type text
3. Save as MS Word XML
4. File -> Reload 
5. A a letter to the comment
6. Save again
7. File -> Reload
Comment 6 pcunger 2017-01-21 03:32:43 UTC Comment hidden (no-value)
Comment 7 Julien Nabet 2017-01-21 09:13:27 UTC
I don't succeed in reproducing this with master sources updated yesterday.
Telesto: I don't understand your step 5:
"A a letter to the comment" , "Add a letter"? If yes, it doesn't work for me ; no extra line when save and reload.
Comment 8 Telesto 2017-01-21 19:33:24 UTC
Created attachment 130600 [details]
Example file

1. Open attached file
2. Make a change to the main document (for example add 'C' after B)
3. Save the file (CTRL+S)
4. Reload the file (File -> reload). Comment will grow with one row.

It could be Win only.
Comment 9 pcunger 2017-01-21 19:39:50 UTC
Behaviour confirmed. I now count 7 empty lines after A in the Comment. There were 6 in the original.
Comment 10 Julien Nabet 2017-01-22 13:53:36 UTC
(In reply to Telesto from comment #8)
> Created attachment 130600 [details]
> Example file
> 
> 1. Open attached file
> 2. Make a change to the main document (for example add 'C' after B)
> 3. Save the file (CTRL+S)
> 4. Reload the file (File -> reload). Comment will grow with one row.
> 
> It could be Win only.

Thank you for the demo file and step by step process; I could reproduce this with master sources updated 2 days ago.
Comment 11 raal 2017-05-13 21:01:39 UTC Comment hidden (obsolete)
Comment 12 Telesto 2017-05-14 09:02:50 UTC
(In reply to raal from comment #11)
> As per today, this regression can't be bibisected as it was introduced
> before 4.4 branch and there's no bibisect repository for the affected
> branch, thus change 'bibisectRequest' to 'preBibisect'

I'm re-adding bibisectRequest, because it's not a Windows only bug. Bibisecting should be possible.
Comment 13 Gabor Kelemen 2018-02-12 07:49:04 UTC
I investigated this a bit. Made two similar files under Linux in 5.4.4 and Windows 5.4.2 (shall check on 6.0 yet)

I created two simple example docx files, one on Linux and one on Windows, with a comment in them, but without ending newline.

What happened: 
- The generated comments.xml files inside do not differ content wise. There is no new line stored in either of them.
- When opening the files on Linux, neither gets an extra new line ending.
- When opening the files on Windows, both gets an extra new line ending. Then this can be correctly written out on save - so this is a Windows-specific file open bug.
Comment 14 Gabor Kelemen 2018-02-12 07:49:56 UTC
Created attachment 139811 [details]
Example file made on Ubuntu with 5.4
Comment 15 Gabor Kelemen 2018-02-12 07:50:49 UTC
Created attachment 139812 [details]
Example file made on Windows with 5.4
Comment 16 Gabor Kelemen 2018-02-12 07:51:47 UTC
Created attachment 139813 [details]
Diff of the contents.xml files - no difference in saved new lines
Comment 17 Gabor Kelemen 2018-02-12 07:52:16 UTC
Created attachment 139814 [details]
The two example files opened on Linux
Comment 18 Gabor Kelemen 2018-02-12 07:53:03 UTC
Created attachment 139815 [details]
The two example files opened on Windows - New lines added to the comments
Comment 19 Gabor Kelemen 2018-02-12 07:57:54 UTC
Created attachment 139816 [details]
Word reads the files correctly too
Comment 20 Kevin Suo 2018-02-12 09:30:49 UTC
(In reply to Telesto from comment #12)
Are you sure this is reproducible on linux? I tried hard but do not reproduce in anyway.

I also reviewed the comments, no one says it was reproducible on linux.

I am setting platform back to Win, keyword preBibisect.
Comment 21 Kevin Suo 2018-02-12 10:16:52 UTC
Steps to Observe the Problem:

1. Download the docx file in attachment 130600 [details], make a copy and unzip it.

2. Open comments.xml with xml editor. 
--> You will see that there are six (6) <w:p> tags, while the 1st one has the value "A", all others are blank.

3. Open the docx file with MSO, count the total lines (paragraphs) in the comments.
--> There are six (6), including "A". This is correct, same as the xml file shows,

4. Open the docx file with Writer, count the total lines (paragraphs) in the comments.
--> There are seven (7), including "A". This is wrong! The xml file shows 6 <w:p> only.

So I guess there is sth wrong with a for-loop in the fileopen code,e.g.:
for (i=0;i<n;i++) {
   ...
}
...
Comment 22 BogdanB 2018-09-20 16:24:53 UTC
I could not reproduce on linux
Version: 6.2.0.0.alpha0+
Build ID: e005ab5d40d358adb75a64e140d46f4bf605647d
CPU threads: 4; OS: Linux 4.15; UI render: GL; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-09-15_02:08:38
Locale: ro-RO (ro_RO.UTF-8); Calc: threaded
Comment 23 pcunger 2018-09-20 16:39:32 UTC
(In reply to BogdanB from comment #22)
> I could not reproduce on linux
> Version: 6.2.0.0.alpha0+
> Build ID: e005ab5d40d358adb75a64e140d46f4bf605647d
> CPU threads: 4; OS: Linux 4.15; UI render: GL; VCL: gtk2; 
> TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time:
> 2018-09-15_02:08:38
> Locale: ro-RO (ro_RO.UTF-8); Calc: threaded

I just retested and it's still an issue with:
Version: 6.0.5.2 (x64)
Build ID: 54c8cbb85f300ac59db32fe8a675ff7683cd5a16
CPU threads: 4; OS: Windows 6.1; UI render: default; 
Locale: en-CA (en_CA); Calc: group
Comment 24 László Németh 2018-09-21 11:01:24 UTC
Proposed fix in https://gerrit.libreoffice.org/#/c/60865/
(maybe it needs to handle the removed "\r\n" case, too, for other paragraphs, but first we try to use only "\n" on Win32, too, see in the patch)

@all Thanks for your helps!
Comment 25 László Németh 2018-09-21 12:50:19 UTC
(In reply to László Németh from comment #24)
> Proposed fix in https://gerrit.libreoffice.org/#/c/60865/
> (maybe it needs to handle the removed "\r\n" case, too, for other
> paragraphs, but first we try to use only "\n" on Win32, too, see in the
> patch)

(Indeed, we need both checkings.)
Comment 26 Commit Notification 2018-09-22 13:47:46 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=bdade7e3fc33c1832c6f6a472443fca8238738d2

tdf#105444 DOCX import: don't put extra paragraphs in comments

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 27 pcunger 2018-09-22 15:52:57 UTC
(In reply to Commit Notification from comment #26)
> <snip>
> 
> Affected users are encouraged to test the fix and report feedback.

I'd be willing to test, but I wouldn't know which version to download... I just went to the daily builds site and found a dizzying array of options! I'm currently running 6.0.5.2 (x64) in Windows 7, so would I go in the "libreoffice-6-0" folder? Or "-6-1"? Or even "master"? And then which file would I choose: "Win-x86_64@62-TDF"? The number "@62" is higher than the other option "@42", but @42 is newer. And there's no indication what "TDF" means... Help?
Comment 28 László Németh 2018-09-22 16:25:14 UTC
(In reply to pcunger from comment #27)
> (In reply to Commit Notification from comment #26)
> > <snip>
> > 
> > Affected users are encouraged to test the fix and report feedback.
> 
> I'd be willing to test, but I wouldn't know which version to download... I
> just went to the daily builds site and found a dizzying array of options!
> I'm currently running 6.0.5.2 (x64) in Windows 7, so would I go in the
> "libreoffice-6-0" folder? Or "-6-1"? Or even "master"? And then which file
> would I choose: "Win-x86_64@62-TDF"? The number "@62" is higher than the
> other option "@42", but @42 is newer. And there's no indication what "TDF"
> means... Help?

The commit was tested successfully on two Windows, too, using two unit tests (one of the tests was an old one, documenting and handling the strange platform specific problem with the comment "FIXME". Now it was fixed that test, too:  https://gerrit.libreoffice.org/#/c/60865/5/sw/qa/extras/uiwriter/uiwriter.cxx), so I think, you don't need to test is, sorry for the automatic request. 

But if you want to confirm the fix, tomorrow you will be able to check it with the upcoming build of https://dev-builds.libreoffice.org/daily/master/Win-x86_64@42/current/ (the fix has been committed only in master (6.2) branch, yet – the libreoffice-6-1 back port is under testing, yet). Thanks, László
Comment 29 Julien Nabet 2018-09-22 16:25:42 UTC
(In reply to pcunger from comment #27)
> ...
> I'd be willing to test, but I wouldn't know which version to download... I
> just went to the daily builds site and found a dizzying array of options!
> I'm currently running 6.0.5.2 (x64) in Windows 7, so would I go in the
> "libreoffice-6-0" folder? Or "-6-1"? Or even "master"? And then which file
> would I choose: "Win-x86_64@62-TDF"? The number "@62" is higher than the
> other option "@42", but @42 is newer. And there's no indication what "TDF"
> means... Help?

If you talk about https://dev-builds.libreoffice.org/daily/, here are some explanation.
"master" means the dev current branch. Today and still for some time, the patches added in this branch will be for next major release 6.2.0

6.1 branch corresponds to versions 6.1.X
6.0 branch corresponds to versions 6.0.X

For the moment the patch has just been put in master so it will be only on a daily build of master branch (as the message says, you must wait 24-48hours that a daily build includes the patch). 

The patch for 6.1 branch is waiting for review. If it's accepted, you'll be able to give a try to a 6.1 daily build included it by waiting by 24-48 hours. Also it'll be in release 6.1.3 I think since 6.1.2 is too near to be released.

See https://wiki.documentfoundation.org/ReleasePlan

About @62, @42, it just corresponds to some machine if I remember well.