Bug 74577 - Pasting a HTML table into Libreoffice Calc can jumble the table up.
Summary: Pasting a HTML table into Libreoffice Calc can jumble the table up.
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.0.0 beta1
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, patch, regression
: 116845 119264 (view as bug list)
Depends on:
Blocks: HTML-Paste
  Show dependency treegraph
 
Reported: 2014-02-05 17:36 UTC by Jonathon
Modified: 2020-08-11 21:23 UTC (History)
16 users (show)

See Also:
Crash report or crash signature:


Attachments
A screenshot of the table pasted into Calc (87.18 KB, image/png)
2014-02-05 17:36 UTC, Jonathon
Details
the html to paste into calc (4.04 KB, text/html)
2014-02-05 17:37 UTC, Jonathon
Details
How the same HTML looks when pasted into Writer (49.71 KB, image/png)
2014-02-05 17:39 UTC, Jonathon
Details
How the same HTML looks when pasted into Excel (44.75 KB, image/png)
2014-02-05 17:40 UTC, Jonathon
Details
Screenshot of the w3c validator, confirming it is valid html (117.75 KB, image/png)
2014-02-05 17:41 UTC, Jonathon
Details
libre office calc staircase 3.6.7.2 vs 4.0.0.1 vs 4.4.0.1 (447.26 KB, image/png)
2015-01-05 14:10 UTC, Johnny Baloney
Details
a patch that possibly fix this bug, but don't trust(see comment) (669 bytes, patch)
2018-03-08 20:30 UTC, himajin100000
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathon 2014-02-05 17:36:39 UTC
Created attachment 93466 [details]
A screenshot of the table pasted into Calc

When I paste the attached HTML into Libreoffice Calc, it leads to a weird staircase, rather than the correct table.

I have attached a screenshot. I will try and attach the HTML, how the same table looks pasted into Libreoffice writer, and how the HTML is validated by the w3c.
Comment 1 Jonathon 2014-02-05 17:37:58 UTC
Created attachment 93468 [details]
the html to paste into calc
Comment 2 Jonathon 2014-02-05 17:39:57 UTC
Created attachment 93471 [details]
How the same HTML looks when pasted into Writer
Comment 3 Jonathon 2014-02-05 17:40:32 UTC
Created attachment 93472 [details]
How the same HTML looks when pasted into Excel
Comment 4 Jonathon 2014-02-05 17:41:24 UTC
Created attachment 93473 [details]
Screenshot of the w3c validator, confirming it is valid html
Comment 5 Kevin Suo 2014-03-15 01:10:30 UTC
Reproducible in Fedora 20 x86, LibreOffice 4.1.5.3 and 4.2.2.1. Set to NEW.

In fact I noticed this annoying behaviour maybe since 4.0.X release, but I am not sure so I only changing the version to 4.1.5.3.

The only "workaround" for me is to paste as un-formated text (so you do not reserve the formating of the html table)

Another test case example is to copy and paste the income statement to calc from the following url:
http://finance.yahoo.com/q/is?s=YHOO+Income+Statement&annual
Comment 6 Alex 2014-03-27 15:18:10 UTC
I have the same issue. It seems to be caused by rows with cells spanning multiple columns, as if the colspan counter isn't resetted for the next row(s). The "staircase" starts at the column where a _previous" row had a colspan attribute. In the examples in the bugreport this is column 1. I have the issue in a table where the first colspan attribute is in column 4 (of row 1). In the pasted tables, columns 1-3 (in rows 2 and further) are ok, but from column 4, the staircase begins.

This is definately a regression from 3.x and i'm almost certain 4.0 didn't have this bug.
Comment 7 Johnny Baloney 2015-01-05 14:09:27 UTC
I have tried with the following versions (attached):

- 3.6.7.2 pass
- 4.0.0.1 fail
- 4.4.0.1 fail
Comment 8 Johnny Baloney 2015-01-05 14:10:40 UTC
Created attachment 111769 [details]
libre office calc staircase 3.6.7.2 vs 4.0.0.1 vs 4.4.0.1
Comment 9 Rostislav 'R.Yu.' Okulov 2015-01-05 20:27:09 UTC
I got something like that.

# bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e                                                                                          
# good: [65fd30f5cb4cdd37995a33420ed8273c0a29bf00] source-hash-d6cde02dbce8c28c6af836e2dc1120f8a6ef9932                                                                                         
git bisect start 'latest' 'oldest'                                                                                                                                                              
# bad: [e02439a3d6297a1f5334fa558ddec5ef4212c574] source-hash-6b8393474974d2af7a2cb3c47b3d5c081b550bdb                                                                                          
git bisect bad e02439a3d6297a1f5334fa558ddec5ef4212c574                                                                                                                                         
# good: [8f4aeaad2f65d656328a451154142bb82efa4327] source-hash-1885266f274575327cdeee9852945a3e91f32f15                                                                                         
git bisect good 8f4aeaad2f65d656328a451154142bb82efa4327                                                                                                                                        
# bad: [9995fae0d8a24ce31bcb5e9cd0459b69cfbf7a02] source-hash-8600bc24bbc9029e92bea6102bff2921bc10b33e                                                                                          
git bisect bad 9995fae0d8a24ce31bcb5e9cd0459b69cfbf7a02                                                                                                                                         
# bad: [51b63dca7427db64929ae1885d7cf1cc7eb0ba28] source-hash-806d18ae7b8c241fe90e49d3d370306769c50a10                                                                                          
git bisect bad 51b63dca7427db64929ae1885d7cf1cc7eb0ba28                                                                                                                                         
# bad: [446a69834acf747d9d18841ec583512ae8fa42e7] source-hash-06a8ca9339f02fccf6961c0de77c49673823b35f                                                                                          
git bisect bad 446a69834acf747d9d18841ec583512ae8fa42e7                                                                                                                                         
# good: [d2720e99b9e6cb7b099256cc7a6d2b3f907b8d7c] source-hash-7dd6c0a8372810f48e6bee35a11ac4ad0432640b                                                                                         
git bisect good d2720e99b9e6cb7b099256cc7a6d2b3f907b8d7c
# bad: [3c228d4685e2981ece0e69cb774dabbef443f77c] source-hash-e63bba0013e5ce34cd04559632206bb7c891eebe
git bisect bad 3c228d4685e2981ece0e69cb774dabbef443f77c
# good: [44ae604621f386a254b6c8fb7599b2c176245149] source-hash-337ef5808dd8e55c06d00b222e69c5ba287acab5
git bisect good 44ae604621f386a254b6c8fb7599b2c176245149
# bad: [977cf448a89278afffc3dd6ece1dea3d0d695345] source-hash-cbc44df67cfd13849f3de85edcdd39b5fec8b06c
git bisect bad 977cf448a89278afffc3dd6ece1dea3d0d695345
# bad: [bc819bc0c4d8592212f84069eb7f65e539517166] source-hash-d9412fb4755377b8358a46a249cfe29a22ea9451
git bisect bad bc819bc0c4d8592212f84069eb7f65e539517166
# first bad commit: [bc819bc0c4d8592212f84069eb7f65e539517166] source-hash-d9412fb4755377b8358a46a249cfe29a22ea9451

bc819bc0c4d8592212f84069eb7f65e539517166 is the first bad commit
commit bc819bc0c4d8592212f84069eb7f65e539517166
Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
Date:   Mon Dec 10 11:35:50 2012 +0000

    source-hash-d9412fb4755377b8358a46a249cfe29a22ea9451
    
    commit d9412fb4755377b8358a46a249cfe29a22ea9451
    Author:     David Tardon <dtardon@redhat.com>
    AuthorDate: Tue Sep 11 07:31:43 2012 +0200
    Commit:     David Tardon <dtardon@redhat.com>
    CommitDate: Tue Sep 11 07:36:32 2012 +0200
    
        fdo#53520 rename portuguese dicts
    
        Change-Id: I70cb4856f1db4722e886407d1c2fdf6a73b9a7f3

:100644 100644 0953f1686760ca6fac561e87d200e640875764f2 44293ef104571a1eaf6bfecd43d4a01702bdf52c M      ccache.log
:100644 100644 d86efaac0c94c20bff57e3036b4417574d73ce4a 86a37681bb8cfd44d04534e14b93dfd45cf0fe63 M      commitmsg
:100644 100644 5728ca3baa483958445168e9b9fccb6f075c5897 ba1824fe7a9f7615a41309f3f318d62e2c4a7589 M      dev-install.log
:100644 100644 8acfb864b3fb560156573153b434b749e43dcf60 5ee43aee8b36e6868002119bee285d15f5000fe0 M      make.log
:040000 040000 580424baebbc6a88398ef8d8478f1b4c5d6c6219 a363beeec352e9ce16ad9cc261f62b98edc79d12 M      opt
Comment 10 Kevin Suo 2015-01-05 23:53:37 UTC
it's bibisected, but not bisected yet. removing keyword bisected.
Comment 11 Rostislav 'R.Yu.' Okulov 2015-01-06 12:15:31 UTC
Ok. Second attempt.

Keywords -> bisected

git bisect start
# bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e
git bisect bad 423a84c4f7068853974887d98442bc2a2d0cc91b
# good: [65fd30f5cb4cdd37995a33420ed8273c0a29bf00] source-hash-d6cde02dbce8c28c6af836e2dc1120f8a6ef9932
git bisect good 65fd30f5cb4cdd37995a33420ed8273c0a29bf00
# bad: [e02439a3d6297a1f5334fa558ddec5ef4212c574] source-hash-6b8393474974d2af7a2cb3c47b3d5c081b550bdb
git bisect bad e02439a3d6297a1f5334fa558ddec5ef4212c574
# bad: [8f4aeaad2f65d656328a451154142bb82efa4327] source-hash-1885266f274575327cdeee9852945a3e91f32f15
git bisect bad 8f4aeaad2f65d656328a451154142bb82efa4327
# good: [369369915d3582924b3d01c9b01167268ed38f3b] source-hash-45295f3cdceb4c289553791071b5d7f4962d2ec4
git bisect good 369369915d3582924b3d01c9b01167268ed38f3b
# good: [6fce03a944bf50e90cd31e2d559fe8705ccc993e] source-hash-47e4a33a6405eb1b5186027f55bd9cb99b0c1fe7
git bisect good 6fce03a944bf50e90cd31e2d559fe8705ccc993e
# good: [da317333e5675622f55c9dda17396c659af65320] source-hash-15af925c254f27046427de70a59011e2ac3d6bdb
git bisect good da317333e5675622f55c9dda17396c659af65320
# bad: [18518588d8414f446ece5591944766f5082ebef5] source-hash-82c25249e624cb54ca6d3293d1c3d0d8ebc208e0
git bisect bad 18518588d8414f446ece5591944766f5082ebef5
# good: [89740762f0af849e492932bd71e59149cdcd5a00] source-hash-06f20d73da21342046a480a6b22af69901351328
git bisect good 89740762f0af849e492932bd71e59149cdcd5a00
# good: [a429a2e082aeb9bff36833603d8deb55385c7905] source-hash-b8fa8841c098f15ef2280aa4c82c55c4f96325c9
git bisect good a429a2e082aeb9bff36833603d8deb55385c7905
# bad: [80860139a96019d7487e02c7b488a8990e1e524f] source-hash-27d3fc221d042decbd84b72719107547562d2e12
git bisect bad 80860139a96019d7487e02c7b488a8990e1e524f
# good: [489397741e799a5ad767e4b12be827c8c96ba60b] source-hash-50b4cbe94e200288d57a135bc9386012164bc726
git bisect good 489397741e799a5ad767e4b12be827c8c96ba60b
# first bad commit: [80860139a96019d7487e02c7b488a8990e1e524f] source-hash-27d3fc221d042decbd84b72719107547562d2e12

80860139a96019d7487e02c7b488a8990e1e524f is the first bad commit
commit 80860139a96019d7487e02c7b488a8990e1e524f
Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
Date:   Mon Dec 10 03:22:19 2012 +0000

    source-hash-27d3fc221d042decbd84b72719107547562d2e12
    
    commit 27d3fc221d042decbd84b72719107547562d2e12
    Author:     Michael Stahl <mstahl@redhat.com>
    AuthorDate: Thu Jul 26 15:53:28 2012 +0200
    Commit:     Michael Stahl <mstahl@redhat.com>
    CommitDate: Thu Jul 26 15:54:50 2012 +0200
    
        warning C4018: '>': signed/unsigned mismatch
    
        Change-Id: I25607ce79111b2c2933ab5e2c165df0594ed4363

:100644 100644 f8a373c120766b5f808babd0871cc880f7d45db8 6a0740ddaa0c3849d69e296a146c7056258c1c96 M  ccache.log
:100644 100644 736da9d34245efb02043bf734c93c44bb4482443 905fa72bf9c3a9dc1157aebf4c2fd3dd170d83af M  commitmsg
:100644 100644 f5d710b0c572830e70b8ed045e299548dbc65754 195e99c14914d41fe8a80648c772ee3be985e70f M  dev-install.log
:100644 100644 372fcc41f0e5177d2cf6f2b46b77fee06a12d7f9 ab06579281e263f9dc61432e780095a800d8d791 M  make.log
:040000 040000 768d61d644bc05b4d30258fbd65cb23e4a07f434 f75867028a7a53bce3b05da24460a5a3ac6835fc M  opt
Comment 12 Kevin Suo 2015-01-06 13:34:22 UTC
(In reply to R.Yu. from comment #11)

If I am not missing sth,

Bibisected is different from bisect.
Bibisected = the range of the commits which causes the bug. We still don't know which exact commit broke this.
Bisected = the exact commit is identified, based on the Bibisected information.

Bibisecting using the libreoffice bibisect repo always produce BIBIsect information.

Removed bisect keyword again.
Comment 13 Matthew Francis 2015-01-14 16:06:37 UTC
As near as I can tell, this was caused by commit f7f99968f2014c9e7f1f216c6ef0d2d31630087d. Later, commit 8bed4206e1b5548e3525021d6d13025a6eb01081 changed the behaviour again but didn't solve the original issue.

Adding Cc: to noelgrandin@gmail.com and markus.mohrhard@googlemail.com; Any change you could sort this one out between you? Thanks


commit f7f99968f2014c9e7f1f216c6ef0d2d31630087d
Author: Noel Grandin <noel@peralex.com>
Date:   Thu Jul 19 15:18:22 2012 +0200

    Convert SV_DECL_VARARR_SORT(ScHTMLColOffset) to o3tl::sorted_vector
    
    Change-Id: I583eeccc2cdb0c3fd0dc60f9e222e026c6b0ead2

commit 8bed4206e1b5548e3525021d6d13025a6eb01081
Author: Markus Mohrhard <markus.mohrhard@googlemail.com>
Date:   Sun Sep 9 18:31:57 2012 +0200

    fix crash introduced by stl conversion, fdo#54299
    
    Change-Id: Ieb39563b1d26c3037d4200f9ec68f4ce2b2c7b42
Comment 14 Robinson Tryon (qubit) 2015-12-13 11:09:34 UTC Comment hidden (obsolete)
Comment 15 Xisco Faulí 2016-09-26 17:05:59 UTC
Adding Cc: to Markus Mohrhard
Comment 16 Justin L 2016-11-21 14:59:54 UTC
This bug is a mess.  Tip to a developer - do your own research from the start on this one. I agree that the ladder effect appears to have started as indicated in comment 9, which comment 13 suggests is 8bed4206e1b5548e3525021d6d13025a6eb01081.

That is hard to test because before that pasting produced random results that differed each time.  My bisecting of that randomness suggests pasting initially broke sometime on 2012-07-26 (a little later than comment 13 suggests).

Although all these commits seem rather minor, I haven't been able revert anything and resolve the problem - so pin-pointing the problem code hasn't been successful yet.  None of the commits look like the kind of code that would cause this regression.
Comment 17 QA Administrators 2017-11-22 15:41:01 UTC Comment hidden (obsolete)
Comment 18 Danilo Camara 2017-11-22 16:51:51 UTC
The bug is still present in Fedora Linux 27

Version: 5.4.3.2
Build ID: 5.4.3.2-1.fc27
CPU threads: 8; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); Calc: group
Comment 19 himajin100000 2018-03-08 20:30:18 UTC
Created attachment 140485 [details]
a patch that possibly fix this bug, but don't trust(see comment)

I attached a patch that adds one line of code. When I applied it to my local repository, this problem is somehow magically gone. 

BUT DON'T TRUST, AS THIS IS MY FIRST TIME USING GIT COMMAND, AND I MAY HAVE DONE SOMETHING WRONG.
Comment 20 Xisco Faulí 2018-03-09 16:11:53 UTC
(In reply to himajin100000 from comment #19)
> Created attachment 140485 [details]
> a patch that possibly fix this bug, but don't trust(see comment)
> 
> I attached a patch that adds one line of code. When I applied it to my local
> repository, this problem is somehow magically gone. 
> 
> BUT DON'T TRUST, AS THIS IS MY FIRST TIME USING GIT COMMAND, AND I MAY HAVE
> DONE SOMETHING WRONG.

Hello,
Thank you very much for the patch.
Could you please submit it to gerrit in order to get it reviewed by other developers? For information see https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch
Comment 21 himajin100000 2018-03-09 18:23:11 UTC
(In reply to Xisco Faulí from comment #20)

Could you please submit it to gerrit?

Done.
https://gerrit.libreoffice.org/#/c/51016/

...Probably...maybe...possibly...(gradually losing confidence)
Comment 22 Buovjaga 2018-04-18 11:30:24 UTC
*** Bug 116845 has been marked as a duplicate of this bug. ***
Comment 23 Volga 2018-04-21 10:16:16 UTC
CC: Eike Rathke, Tomas Lendo
Comment 24 Xisco Faulí 2018-08-21 16:46:50 UTC
*** Bug 119264 has been marked as a duplicate of this bug. ***
Comment 25 kavalec74 2018-08-21 17:16:54 UTC
Reported 4 years ago? :O
Comment 26 titovaskz 2018-11-12 20:25:19 UTC
Hi, I have the same problem with libreoffice 6.1.3 Calc. There is some possible solution for this? 
I can not use javomi with Calc 
thanks
Comment 27 QA Administrators 2019-11-13 03:32:34 UTC Comment hidden (obsolete)
Comment 28 Jonathon 2019-11-13 03:48:17 UTC
this bug is still an issue in 6.3.3.2
Comment 29 Kevin Suo 2020-02-01 16:46:27 UTC
For some technical reason which I am not familiar with, himajin's patch on gerrit was abandoned. 

That patch was quite old, it may work but may not be suitable to be applied to today's master.
Comment 30 Pablo Navarro 2020-04-23 00:33:05 UTC
I just signed up to report this exact bug: colspan breaks copying html tables.

It can easily be reproduced:
- Create an HTML file (file.html) with these 6 lines:
<table border="1">
<tr><th colspan="2">Columns A+B<th>Column C
<tr><td>Value A<td>Value B<td>Value C
<tr><td>Value A<td>Value B<td>Value C
<tr><td>Value A<td>Value B<td>Value C
</table>

- Open that file in a browser, it will look like this:
   Columns A + B  | Column C
Value A | Value B | Value C
Value A | Value B | Value C
Value A | Value B | Value C

- Copy the table
- Paste it to LibreOffice Calc

It will look like this:
        A         |    B      |     C     |     D     |    E      |    F
   Columns A + B  | Column C  |           |           |           |
            Value A           | Value B   | Value C   |           |
            Value A                       | Value B   | Value C   |
            Value A                                   | Value B   | Value C


Instead of resetting the colspan, it doubles it every for every next row.

Hope this helps to solve this old bug.
I just tried it in the current nightly and it's still here.
Comment 31 stragu 2020-08-11 21:23:07 UTC
Same problem in the final release of 7.0.0:

Version: 7.0.0.3
Build ID: 8061b3e9204bef6b321a21033174034a5e2ea88e
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

I discovered this when trying to copy and paste a table from Jamovi into Calc. But following the steps in comment 30 is a great way to test.

Note that the table looks good in Writer!

One partial workaround is to paste first in Writer, and then copy and paste again into Calc. It is only partial because there is another issue (bug 101313) with copying and pasting merged cells between Writer and Calc, so some manual work is still needed (but it is less effort than the issue described here).