Bug 97364 - HTML Clipboard issues in Base, Calc
Summary: HTML Clipboard issues in Base, Calc
Status: RESOLVED DUPLICATE of bug 37859
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
5.2.0.0.alpha0+
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 97365
  Show dependency treegraph
 
Reported: 2016-01-25 18:07 UTC by Urmas
Modified: 2017-06-17 17:56 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Clipboard contents (1.49 KB, text/plain)
2016-01-28 18:55 UTC, Urmas
Details
Calc paste screenshot (49.85 KB, image/png)
2016-04-10 08:50 UTC, kyama
Details
Database file (3.71 KB, application/vnd.sun.xml.base)
2016-04-10 08:53 UTC, kyama
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Urmas 2016-01-25 18:07:40 UTC
When copying records from a Base table,

In HTML format, the header has "charset=windows-1252" meta, but the data contents is in UTF-8. Please fix that error.

When pasting this into Calc,

The data are interpreted as Japanese encoding, instead of Windows-1252. As a consequence, the data cannot be fixed easily with codepage converter tools. Please insert the UTF-8 data from Base in the Windows-1252 encoding.
Comment 1 Urmas 2016-01-25 18:12:44 UTC
As I am unable to attach a test file, you can use the string "Образец текста" as a sample data with a new database to reproduce this.
Comment 2 Buovjaga 2016-01-28 16:55:39 UTC
Could you give some more detailed steps?
I created a table with a LONGVARCHAR field and pasted Образец текста into it.
How can I now copy in HTML format?
Comment 3 Urmas 2016-01-28 17:01:22 UTC
Just select the rows and copy them. In Calc, open the Paste button menu and choose 'HTML....'
Comment 4 Buovjaga 2016-01-28 17:48:13 UTC
(In reply to Urmas from comment #3)
> Just select the rows and copy them. In Calc, open the Paste button menu and
> choose 'HTML....'

Ok, I managed to do it, but how do I confirm the encoding?
I saved to .ods and opened content.xml, but there is no windows-1252

Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+
Build ID: 259c1ed201f4277d74dfd600fed8c837cbf56abc
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@39, Branch:master, Time: 2016-01-27_00:45:12
Locale: fi-FI (fi_FI)
Comment 5 Urmas 2016-01-28 18:55:24 UTC
Created attachment 122262 [details]
Clipboard contents

Here's what getting copied.
Comment 6 Buovjaga 2016-01-28 19:13:30 UTC
Ok, thanks, I used http://www.nirsoft.net/utils/inside_clipboard.html and confirmed.

Win 7 Pro 64-bit Version: 5.2.0.0.alpha0+
Build ID: 259c1ed201f4277d74dfd600fed8c837cbf56abc
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@39, Branch:master, Time: 2016-01-27_00:45:12
Locale: fi-FI (fi_FI)
Comment 7 kyama 2016-04-10 08:16:51 UTC
I have a similar problem.

I made a table with Base, input data in Japanese and tried to export the records to Calc following the instruction in the "Exporting data from Base" section in the page below.
https://help.libreoffice.org/Common/Importing_and_Exporting_Data_in_Base

If I paste the data on a new Calc sheet, I get accented Latin letters or symbols (Windows-1252 characters) instead of Japanese.
Tried to paste as RTF (default) and HTML formats but both have the same encoding problem.

Win 10 32-bit, LibreOffice Version: 5.1.2.2
Build ID: d3bf12ecb743fc0d20e0be0c58ca359301eb705f
CPU Threads: 2; OS Version: Windows 6.2; UI Render: default; 
Locale: ja-JP (ja_JP)
Comment 8 kyama 2016-04-10 08:50:34 UTC
Created attachment 124228 [details]
Calc paste screenshot
Comment 9 kyama 2016-04-10 08:53:04 UTC
Created attachment 124229 [details]
Database file
Comment 10 QA Administrators 2017-05-22 13:27:05 UTC Comment hidden (obsolete)
Comment 11 kyama 2017-05-23 00:47:03 UTC
The bug is still present.
If I copy and paste data from Base table to Calc, I get wrongly encoded characters.
I think this bug is inherited from OOo. I will test on the oldest version later.

Win 10 64-bit
Version: 5.3.3.2 (x64)
Build ID: 3d9a8b4b4e538a85e0782bd6c2d430bafe583448
CPU Threads: 8; OS Version: Windows 6.19; UI Render: default; Layout Engine: new; 
Locale: ja-JP (ja_JP); Calc: group
Comment 12 Julien Nabet 2017-06-09 19:26:17 UTC
Just wonder if it could be a dup of tdf#37859

Anyone could give a try with a daily build? (see http://dev-builds.libreoffice.org/daily/master/)
Comment 13 kyama 2017-06-10 09:27:58 UTC
tdf#37859 looks the same bug.
I will test with the dev build.
Comment 14 Buovjaga 2017-06-17 17:56:29 UTC
(In reply to Julien Nabet from comment #12)
> Just wonder if it could be a dup of tdf#37859
> 
> Anyone could give a try with a daily build? (see
> http://dev-builds.libreoffice.org/daily/master/)

Yep, it seems to be as now the characters in HTML are encoded as entities instead of being messed up. Let's close as dupe.

Version: 6.0.0.0.alpha0+ (x64)
Build ID: 2404a17e157273430d40ceaa1ab1275e7b50ba6e
CPU threads: 4; OS: Windows 6.19; UI render: default; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2017-06-16_23:41:27
Locale: fi-FI (fi_FI); Calc: group

*** This bug has been marked as a duplicate of bug 37859 ***