Bug Hunting Session
Bug 78801 - Copying HTML from web browser - only plain text is pasted
Summary: Copying HTML from web browser - only plain text is pasted
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.3.3 release
Hardware: All All
: highest normal
Assignee: Andrzej Hunt
URL:
Whiteboard: target:4.4.0 target:4.2.5 target:4.3....
Keywords: bibisected, regression
: 77492 77669 78802 78818 (view as bug list)
Depends on:
Blocks: mab4.2
  Show dependency treegraph
 
Reported: 2014-05-16 21:33 UTC by Joel Madero
Modified: 2015-12-17 08:06 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joel Madero 2014-05-16 21:33:47 UTC
System Specs:
Ubuntu 14.04 x64
LibreOffice 4.2.4.2 (confirmed on 4.3 master)

Steps to Reproduce:
1. Go to a website with text and image(s) -- e.g., http://www.tufts.edu/alumni/magazine/fall2013/features/up-in-arms.html

2. Highlight text and image - copy

3. Paste into writer

Observed: Text is pasted, image is skipped

Expected: Everything pasted

Regression
Comment 1 Jorendc 2014-05-16 22:30:30 UTC
Repro, win 8.1.
Comment 2 QA Administrators 2014-05-17 02:11:34 UTC
*** Bug 78802 has been marked as a duplicate of this bug. ***
Comment 3 QA Administrators 2014-05-17 02:13:48 UTC
commit 7acd9bd59e3ecec327135da4496f7211e8bfeb13
Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
Date:   Sun May 11 12:04:42 2014 +0000

    source-hash-e414cdbf3321d579537b372d815d50c31195ecc3
    
    commit e414cdbf3321d579537b372d815d50c31195ecc3
    Author:     Caolán McNamara <caolanm@redhat.com>
    AuthorDate: Wed Feb 12 09:53:37 2014 +0000
    Commit:     Caolán McNamara <caolanm@redhat.com>
    CommitDate: Wed Feb 12 13:03:52 2014 +0000
    
        coverity#1103660 Division or modulo by zero
    
        Change-Id: I468b218635e10e04bb25150b6275e187ba8a8316

:100644 100644 5c6267f50eed99dc11ee078afca138ec5fb0a140 ed35f89410da5497ace1fff13aa7a3d27e82cf69 M	ccache.log
:100644 100644 36083c9ad3720785f3761ed2bf6ec511ae68aa42 6f1ccb7662d3cab2f48c8ccb760d51de6c426f0c M	commitmsg
:100644 100644 b4acd20e4800485d1c45f928c60997d61397ae88 585ffbe4ddeb17c0d04ea1a2a501952ce02c70ad M	make.log
:040000 040000 a017e6f908f0e99824314177084a3a994a5d0693 51e1c0b302d3d0fe07a111910298f4678ef6e751 M	opt


# bad: [ea24c76037fa8056fb1ed916f4d5e765ebc71f8f] source-hash-46cfcd5a05aa1d13fecd73f5a25b64b8d8dd6781
# good: [6ab7f53af36f13bbefdd4e4fcbd3d1ea432a77d9] source-hash-22029c7e17b4cb48acb058d47ec9c3b6b8b6b294
git bisect start 'latest' 'oldest'
# bad: [45f8b50e2e33d2125a348134dd931ba01c70fe63] source-hash-0141153498bc34237d5b7dd72226ac2a03bbd11d
git bisect bad 45f8b50e2e33d2125a348134dd931ba01c70fe63
# good: [d7078f40ea1c0e9f2ff1f6c29a19f6385130386d] source-hash-3034b144d0062e9c4394b901aded43fec117ed11
git bisect good d7078f40ea1c0e9f2ff1f6c29a19f6385130386d
# good: [66462cb8902cb365c2d00a044fca097688aeeac1] source-hash-202d721e3cb35eb4402882dbe4b81ceccd9f4e0a
git bisect good 66462cb8902cb365c2d00a044fca097688aeeac1
# good: [ed0ff5487bfb1a9f23d9bdc88cf6692f3897e5e8] source-hash-281bdaaa0b2860ef2f2b08b1acb3f930f81cf020
git bisect good ed0ff5487bfb1a9f23d9bdc88cf6692f3897e5e8
# bad: [71a190546f9421c4f865b56409e9b649202befb4] source-hash-6124183d7b6742bbf41d61c6bae342180dd2ed8e
git bisect bad 71a190546f9421c4f865b56409e9b649202befb4
# bad: [7acd9bd59e3ecec327135da4496f7211e8bfeb13] source-hash-e414cdbf3321d579537b372d815d50c31195ecc3
git bisect bad 7acd9bd59e3ecec327135da4496f7211e8bfeb13
# good: [6302870ae46a83d04b1e45d235e57cf1397e018a] source-hash-7fb6ae28ae7bebd67c2b9bf2cf517f1f7bb2777e
git bisect good 6302870ae46a83d04b1e45d235e57cf1397e018a
# first bad commit: [7acd9bd59e3ecec327135da4496f7211e8bfeb13] source-hash-e414cdbf3321d579537b372d815d50c31195ecc3
Comment 4 Björn Michaelsen 2014-05-17 23:45:59 UTC
Commits in the range:
http://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=7fb6ae28ae7bebd67c2b9bf2cf517f1f7bb2777e..e414cdbf3321d579537b372d815d50c31195ecc3

The "Prefer embedding image data" sounds suspicous and that commit indeed was cherry-picked to libreoffice-4-2: https://gerrit.libreoffice.org/#/c/8384/ -- thus CC'ing Andrzej Hunt.
Comment 5 Andrzej Hunt 2014-05-18 09:26:26 UTC
Yeah, looks like I need to look at the ordering there more carefully.

Specifically I'm guessing SOT_FORMAT_STRING is matching before SOT_FORMATSTR_ID_HTML ( my previous commit was intended to have bitmap preferred to html -- but I clearly never thought things through carefully enough when doing so ).
Comment 6 Florian Reisinger 2014-05-18 10:21:59 UTC
*** Bug 77492 has been marked as a duplicate of this bug. ***
Comment 7 Florian Reisinger 2014-05-18 10:25:07 UTC
Just because I think I made up a reason for this. Could it be, that this has been done on purpose to prevent XSS? ( http://en.wikipedia.org/wiki/Cross-site_scripting )
Comment 8 Riccardo Vianello 2014-05-25 22:11:09 UTC
I confirmed this error, when i copy this text with image of website, the text is copied while the image isn't copied.

But with 4.3.0 Beta 1, if i change the "Options Paste" and i choose "Html format without comments" or "Html format" the all text and the image are showed corretly.

I think which the default copy of this program is set on the "text not formatted" or "CTRL+V" for windows.
Comment 9 Michael Stahl (CIB) 2014-05-27 20:22:26 UTC
*** Bug 78818 has been marked as a duplicate of this bug. ***
Comment 10 Andrzej Hunt 2014-05-28 06:49:59 UTC
I've put a tentative patch on https://gerrit.libreoffice.org/#/c/9518/ -- unfortunately I'm slightly short on time to do detailed testing right now, but it seems to fix the common cases on writer (i.e. pasting just an image pastes the image proper, pasting an html selection pastes everything, and text also continues to work as usual...).
Comment 11 Commit Notification 2014-05-28 17:45:45 UTC
Andrzej Hunt committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=538c13f3d1756f2d105115f64ab1bc0b7426eebc

fdo#78801 fdo#52547 Paste preference is image, then html, then text.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Andrzej Hunt 2014-05-28 17:51:42 UTC
Should hopefully be reliable now.

(Interestingly I've noticed that pasting images from a browser doesn't work
at all for calc, even with all my commits in this area reverted (but works fine
for the case of mixed text+images), but that's a separate bug in any case...)
Comment 13 Commit Notification 2014-05-29 09:59:14 UTC
Andrzej Hunt committed a patch related to this issue.
It has been pushed to "libreoffice-4-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f799e27baa914c220a7aa86702d1061258016a96&h=libreoffice-4-2

fdo#78801 fdo#52547 Paste preference is image, then html, then text.


It will be available in LibreOffice 4.2.6.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2014-05-29 10:00:47 UTC
Andrzej Hunt committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=56958bd0d9a150ee5230c02c03b5756500bc64b4&h=libreoffice-4-3

fdo#78801 fdo#52547 Paste preference is image, then html, then text.


It will be available in LibreOffice 4.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2014-05-30 13:21:22 UTC
Andrzej Hunt committed a patch related to this issue.
It has been pushed to "libreoffice-4-2-5":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f58fa2cf44b7fbac85c260919ae902206f69ea7b&h=libreoffice-4-2-5

fdo#78801 fdo#52547 Paste preference is image, then html, then text.


It will be available already in LibreOffice 4.2.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 domike 2014-05-31 18:05:12 UTC
4.2 dev version solves bug #78818 (a duplicate of this one) for me on Linux Mint 16. Thanks!
Comment 17 jlerner10 2014-06-01 19:16:12 UTC
When I try to do a Copy and Paste from IE or FF that has bullets or numbered lists they do not get pasted.

It will allow you to paste using the Special option of HTML without Comments it places bullets in place of where it should be numbered.

I was at URL as a test http://windows.microsoft.com/en-us/windows/rename-user-account#1TC=windows-7

I am using FF 30.0 beta 9 on Windows 7 Pro SP1
LibreOffice 4.2.5.1
Comment 18 ice.simx 2014-06-03 10:45:42 UTC
on Ubuntu 14.04 with LibreOffice 4.2.3, 4.2.4, 4.3.0 beta1
web browser used: Firefox 29.0

hold left mouse button and select text with image,
then select copy or Ctrl+C
then in writer do paste, oh, unformatted text. If do Edit > Paste Special
and select HTML from option (as there were only two options Unformatted Text / HTML)

RESULT REQUIRED: formatted paste as in HTML document (same font, color, size and most important image)

if select Edit > Links > Break Link

it show small box with error (internet is accessible via ms-proxy server)


And if copy from pdf, no option for formatted text only unformatted text :(
Comment 19 Michael Stahl (CIB) 2014-06-03 10:56:38 UTC
please don't re-open a bug based on the fact that releases that don't have the fix yet (as indicated by comment #15 etc. and the whiteboard target entries) are still buggy, it is just wasting everybody's time.
Comment 20 ice.simx 2014-06-05 06:38:01 UTC
(In reply to comment #19)
> please don't re-open a bug based on the fact that releases that don't have
> the fix yet (as indicated by comment #15 etc. and the whiteboard target
> entries) are still buggy, it is just wasting everybody's time.

i'm sorry Michael.

but i also mentioned "behind proxy", may be not clearly, my mistake.
i have again tested it with 4.3.0.beta1

scenario 1: machine has direct connection with internet.
* copy text and image from website, 
* edit > paste special, then select HTML
* display small box with url and after few seconds it shows image (as required)
* edit > links, then select break link. Image still visible on writer page as required

scenario 2: machine access internet via proxy and proxy setting are NOT set system wide, but in Firefox only (entered my domain username/password in firefox)
* copy text and image from website, 
* edit > paste special, then select HTML
* this time NO image, only small box with url.

Seems 'Paste Special' not working with Proxy. And it is related with "Copy text & image from website but image not copied"
Comment 21 Michael Stahl (CIB) 2014-06-05 09:00:13 UTC
there is absolutely nothing about this bug that has _anything_ to do with a proxy.  please stop derailing it and open a _new_ bug about the problem you're seeing already.
Comment 22 manj_k 2014-06-07 22:23:09 UTC
*** Bug 77669 has been marked as a duplicate of this bug. ***
Comment 23 bugquestcontri 2014-06-08 02:14:20 UTC
It might not be needed anymore but triggered through AskLibO http://ask.libreoffice.org/en/question/35166/why-does-copy-and-paste-remove-bold-and-italics/

I tested LibO Writer 4.2.4.2 on XP using FF 29 and can confirm the bug.
Comment 24 Robinson Tryon (qubit) 2015-12-17 08:06:18 UTC
Migrating Whiteboard tags to Keywords: (bibisected)
[NinjaEdit]