Bug 103960 - copy paste table from internet
Summary: copy paste table from internet
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
(earliest affected)
Hardware: All Linux (All)
: medium normal
Assignee: Jan Holesovsky
Whiteboard: target:5.3.0
Keywords: bibisected, bisected, regression
Depends on:
Reported: 2016-11-16 18:12 UTC by raal
Modified: 2016-11-20 07:28 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:

printscreen (92.96 KB, text/document)
2016-11-16 18:14 UTC, raal
selection on the web page (20.40 KB, image/png)
2016-11-17 08:31 UTC, raal

Note You need to log in before you can comment on or make changes to this bug.
Description raal 2016-11-16 18:12:41 UTC
copy, paste table from internet to Calc

Steps to Reproduce:
1.copy, paste table from internet to Calc

Actual Results:  
whole table's data are in the first cell

Expected Results:
table imported

Reproducible: Always

User Profile Reset: No

Additional Info:

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0
Comment 1 raal 2016-11-16 18:14:13 UTC
This seems to have begun at the below commit.
Adding Cc: to Jan Holesovsky ; Could you possibly take a look at this one?

author	Jan Holesovsky <kendy@collabora.com>	2016-11-03 16:14:01 (GMT)
committer	Jan Holesovsky <kendy@collabora.com>	2016-11-03 20:07:19 (GMT)
commit	b297f7bbfed83f87398231740e910afe6ebfbb97 (patch)
tree	a00d6cf4675a5dd01bba21f140197ba130a28114
parent	5d9d0f3c979732ade57b9c4c4960dd030ffdc9f9 (diff)
tdf#88821: Set the encoding correctly for HTML files with a BOM.
BOM (Byte Order Mark) in the HTML file changed the underlying eSrcEnc
encoding, but did not actually update the rtl_TextToUnicodeConverter hConv.

Subsequent changes of eSrcEnc in SetSrcEncoding() (triggered by
'content="application/xhtml+xml; charset=UTF-8"' in the HTML file) were then
ignored (eSrcEnc was already set to UTF-8), and the parser was happily using the
old (Windows-1250) hConv.

ddfcdc25005702490f6c34ded21492187279864b is the first bad commit
commit ddfcdc25005702490f6c34ded21492187279864b
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Thu Nov 3 23:27:18 2016 +0100

    source sha:b297f7bbfed83f87398231740e910afe6ebfbb97

 git bisect log
# bad: [b356e15c1f14e3d0f0bb73662c878d14fc8aa992] source sha:ada8a2123ea655142be74a11c23e042a0109d5f8
# good: [33e60eae04c889baf52713a73dc9944015408914] source sha:5b168b3fa568e48e795234dc5fa454bf24c9805e
git bisect start 'origin/master' 'oldest'
# good: [78a4f08cf26d3f800710c509a99b1f4ad8a4e783] source sha:db231633af4667e24281e0be69ab63ad3081fdc3
git bisect good 78a4f08cf26d3f800710c509a99b1f4ad8a4e783
# good: [d9f68fca812338acc473efb5053add57fbdf6415] source sha:8fab6ab36589d0dcd75d45feab43a0b06b7f2a3e
git bisect good d9f68fca812338acc473efb5053add57fbdf6415
# good: [ed68fdb510a1b043a83cd50a28ee77bdd9ea943d] source sha:ae3fb69ebca4e253959cdf9bf620296e7797a501
git bisect good ed68fdb510a1b043a83cd50a28ee77bdd9ea943d
# good: [fdf1ddec40d4d1cd66f0dba3eefbcca82e0ee599] source sha:a8aab44d75e4704327b4330b532883b59380b7d3
git bisect good fdf1ddec40d4d1cd66f0dba3eefbcca82e0ee599
# bad: [f361af9556d41663aa374654b0bbd292866e6f41] source sha:2ed4034aa51292a8bb8e770213f0021a3f1c9408
git bisect bad f361af9556d41663aa374654b0bbd292866e6f41
# bad: [7fbd308e2fcccf6320de9003c0556b1a1e8020fb] source sha:8eff1decd91cbfb10094c25d4cf1d2b434a4da72
git bisect bad 7fbd308e2fcccf6320de9003c0556b1a1e8020fb
# bad: [449b545d9c342e4c90c9b67db6a42f4480c3939d] source sha:5b389c32eb3928c59387c2d6d48667632d7e9206
git bisect bad 449b545d9c342e4c90c9b67db6a42f4480c3939d
# bad: [ddae1fffca1f547a214aa52c82ef6fb0127c6b31] source sha:c1ea561f6a01044357052789bbc6c8ec52061d41
git bisect bad ddae1fffca1f547a214aa52c82ef6fb0127c6b31
# bad: [b515c7f5bd78d38892e8b3f82f484ed6d4a1483e] source sha:8d777f85eaff6af8896942590316b7cd9f2c3e75
git bisect bad b515c7f5bd78d38892e8b3f82f484ed6d4a1483e
# good: [05581b27841857bd6bea483864e8ea941f87aab3] source sha:a30f969432a451ade87b93e3077836a849b8f11b
git bisect good 05581b27841857bd6bea483864e8ea941f87aab3
# good: [7fdf4c6a86169802472b0df95c5a2455b4fca7dc] source sha:2a818a0aafac218ca09bb079d7f2cf0879385e4a
git bisect good 7fdf4c6a86169802472b0df95c5a2455b4fca7dc
# bad: [ddfcdc25005702490f6c34ded21492187279864b] source sha:b297f7bbfed83f87398231740e910afe6ebfbb97
git bisect bad ddfcdc25005702490f6c34ded21492187279864b
# good: [17d8234546ccc7b6cd69f3288c5ee086d48b7b1c] source sha:5d9d0f3c979732ade57b9c4c4960dd030ffdc9f9
git bisect good 17d8234546ccc7b6cd69f3288c5ee086d48b7b1c
# first bad commit: [ddfcdc25005702490f6c34ded21492187279864b] source sha:b297f7bbfed83f87398231740e910afe6ebfbb97
Comment 2 raal 2016-11-16 18:14:50 UTC
Created attachment 128795 [details]
Comment 3 MM 2016-11-16 22:17:42 UTC
Do you have an example from internet ? Guess your outcome is the same with different browsers ?!
Comment 4 raal 2016-11-17 08:30:24 UTC
(In reply to MM from comment #3)
> Do you have an example from internet ? Guess your outcome is the same with
> different browsers ?!

Hello, for example this page
Ubuntu linux, Firefox
Comment 5 raal 2016-11-17 08:31:05 UTC
Created attachment 128798 [details]
selection on the web page
Comment 6 Xisco Faulí 2016-11-17 09:17:43 UTC
Setting this bug as NEW as the commit that introduced the regression has been identified
Comment 7 MM 2016-11-17 19:45:09 UTC
Confirmed with Version:
Build ID: 098f7a4ac2b6f309a45d29f1b68bea18418b9ee7
CPU Threads: 2; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; Layout Engine: new; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-11-15_11:28:35
Locale: en-US (en_US.UTF-8); Calc: single

Doesn't only affects calc, but also writer.
Comment 8 Michael Meeks 2016-11-18 13:24:30 UTC
Nice catch Raal - Kendy will look at this one I think =)
Comment 9 Commit Notification 2016-11-18 14:10:22 UTC
Jan Holesovsky committed a patch related to this issue.
It has been pushed to "master":


tdf#103960: The import of UCS2 data uses a different code path.

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:

Affected users are encouraged to test the fix and report feedback.
Comment 10 Jan Holesovsky 2016-11-18 14:16:41 UTC
raal: Thanks so much for the clear bugreport & bisect - much appreciated!

Fixed now :-)
Comment 11 raal 2016-11-20 07:28:02 UTC
Thanks for the fix.
Verified in Version:
Build ID: 8d613870b2cd2e3e4396b4fa97dbd8080fda8f52
CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk2; Layout Engine: new; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-11-18_23:09:33