Bug 137588 - Wrong HTML copy/paste from webpage
Summary: Wrong HTML copy/paste from webpage
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
7.0.2.2 release
Hardware: x86-64 (AMD64) Linux (All)
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-19 09:22 UTC by Nemecsek
Modified: 2020-11-04 11:41 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nemecsek 2020-10-19 09:22:25 UTC
Description:
Garbled text when pasting from HTML pages into Libreoffice Writer/Calc 7. 

Steps to Reproduce:
1. Enter www.deepl.com
2. Choose English to German translation
3. Enter the sentence "The trees are big". You read "die Bäume sind Groß".
4. Copy (Ctrl+V) the German translation and paste it in Libreoffice Writer or Calc.
4a. Alternate: Paste Special -> 
    Writer: it reads "Unknown Source" instead of HTML and it doesn't offer "As HTML" but only "Unformatted text"
    Calc: it also provides "Use text import dialog", but I cannot find any Character set that provides a correct encoding.


Actual Results:
The pasted text contains garbled chars: "die Bäume sind groÃ" 


Expected Results:
The correct text should be "die Bäume sind groß"



Reproducible: Always


User Profile Reset: No



Additional Info:
EASY SOLUTION: When the text source is unknown it would be better to leave ALL the possible options to choose from, instead of only providing "unformatted text".

Copy/paste from other pages in German (i.e. spiegel.de) are correctly recognized as HTML and pasted accordingly into Writer/Calc.

It looks like a problem of deepl.com, but we should take care of these anomalous pages and provide a viable solution. Copying from HTML is quite common and it is possible deepl.com is not the only page where this happens. 

BTW, pasting this same text in any other text editor in Ubuntu works correctly. Only Libreoffice doesn't manage it correctly.
Comment 1 Timur 2020-11-03 09:23:26 UTC
Upon some testing, I don't reproduce. 

If you just copy "die Bäume sind Groß", paste is only  "Unformatted text" as expected - NotABug.
You claim "The pasted text contains: "die Bäume sind groÃ" " but I don't confirm.
Both simple paste and Paste Special with dialog work fine, text is UTF-16.

If real HTML is selected, then it's pasted as HTML, like:
Alternatives:
Die Bäume sind gross.
Die Bäume sind sehr groß.
Die Bäume sind riesig.

I think there's existing bug 108243 here with ; in font name, so not seen in Writer. Calc asks language in import dialog. 

Please reconsider and do more tries. 
If you still think there's a bug, you may set Unconfirmed, with more precise single issue and screenshot of import.
Comment 2 Nemecsek 2020-11-04 08:13:41 UTC
(In reply to Timur from comment #1)
> Upon some testing, I don't reproduce. 
> 
> If you just copy "die Bäume sind Groß", paste is only  "Unformatted text" as
> expected - NotABug.
> You claim "The pasted text contains: "die Bäume sind groÃ" " but I don't
> confirm.
> Both simple paste and Paste Special with dialog work fine, text is UTF-16.
> 
> If real HTML is selected, then it's pasted as HTML, like:
> Alternatives:
> Die Bäume sind gross.
> Die Bäume sind sehr groß.
> Die Bäume sind riesig.
> 
> I think there's existing bug 108243 here with ; in font name, so not seen in
> Writer. Calc asks language in import dialog. 
> 
> Please reconsider and do more tries. 
> If you still think there's a bug, you may set Unconfirmed, with more precise
> single issue and screenshot of import.

@Timur
The problem doesn't exist in other html pages (I cited spiegel.de) but AT LEAST in deepl.com. 
I don't know why this page is so special that the pasted text is not recognized as html but can be only pasted as unformatted text. 

Did you try in deepl.com and it was pasted correctly? If yes, did you paste it as "unformatted text"? 

I don't understand why the font name could be involved in this "bug".
Comment 3 Timur 2020-11-04 08:41:12 UTC
(In reply to Nemecsek from comment #2)

> Did you try in deepl.com and it was pasted correctly? If yes, did you paste
> it as "unformatted text"? 
Yes. That's why I closed. 
Try another browser, try to reset user profile, try different LO versions which is easy from https://libreoffice.soluzioniopen.com/, including daily master. 

> I don't understand why the font name could be involved in this "bug".
If you translate The trees are big.with dot, it offers Alternatives. 
If copied all, they are pasted in Calc (Alternatives and 3 lines) but not in Writer (just Alternatives).  
Font is "Open Sans;sans-serif" with ;.

> (In reply to Timur from comment #1)
> > Please reconsider and do more tries. 
> > If you still think there's a bug, you may set Unconfirmed, with more precise
> > single issue and screenshot of import.
You didn't do this so I close again. Precise to specify: Witer or Calc, Unformatted ot HTML,screenshot. 

Please do not set Unconfirmed unless you can give something specific and reproducible.
Comment 4 Nemecsek 2020-11-04 11:41:48 UTC
Alas I don't have anymore the time to follow this issue. Sorry.

All the procedure to reproduce it is described in my issue. 
I tried multiple browsers (Firefox, Brave, Chrome) and all have the same problem, both in the office and at home. If you cannot reproduce it quickly it must be related to my configuration. 

Please leave it closed. 
Thank you for your time.