Bug 37859 - Odb data copied to Calc showed wrong encoding in Windows
Summary: Odb data copied to Calc showed wrong encoding in Windows
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
3.4.4 release
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Julien Nabet
URL:
Whiteboard: target:5.5.0 target:5.4.0.1 target:5.3.5
Keywords:
: 79631 97346 97364 97365 (view as bug list)
Depends on:
Blocks: Paste
  Show dependency treegraph
 
Reported: 2011-06-02 08:53 UTC by Cheng-Chia Tseng
Modified: 2017-06-17 17:58 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
Windows platform. (89.68 KB, image/png)
2011-06-02 08:53 UTC, Cheng-Chia Tseng
Details
Linux platform (353.03 KB, image/png)
2011-06-02 08:53 UTC, Cheng-Chia Tseng
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cheng-Chia Tseng 2011-06-02 08:53:07 UTC
Created attachment 47456 [details]
Windows platform.

There is a user complained that copying Odb file (using Traditinal Chinese) data to Calc has characters which are encoed wrongly.

You can get the file at here: https://docs.google.com/leaf?id=0B9JKiYcC-SFQMDIwNjA5ODItMzFkOC00M2M3LTgxYTUtMDc4NmEwYzY2YWMy&hl=en_US&authkey=CMqpnt0H

I tested LibO 3.4 RC2, the problem do exist. But the problem does not occur on LibO 3.3.2 Linux platform.

See the attachments for output of Windows platform (having bug), and Linux platform (having no bug).
Comment 1 Cheng-Chia Tseng 2011-06-02 08:53:48 UTC
Created attachment 47457 [details]
Linux platform
Comment 2 Cheng-Chia Tseng 2011-06-02 08:58:36 UTC
In all, the copying action from Base will produce characters which are not eoncoded well (http://en.wikipedia.org/wiki/Mojibake) in Calc of Windows, but not in Calc of Linux.
Comment 3 Andras Timar 2011-06-06 23:39:45 UTC
I could not reproduce the error on Windows with LibreOffice 3.4.0. Can you please give the steps you made? It works well via Data Sources (F4) in Calc, also copy & paste strings from Base works.
Comment 4 Cheng-Chia Tseng 2011-06-07 07:21:53 UTC
I use the mehtod in this blog post: http://openoffice.blogs.com/openoffice/2007/04/farrrrrr_simple.html

Go to "Table" first, then right click the Data I would like to "Copy" to Calc. This would create Data in wrong encoding.

Plus, drag the Data icon directly and drop to Calc, the problem can also be avoided.
Comment 5 Rainer Bielefeld Retired 2011-06-10 02:57:55 UTC
RC2 is bit by bit identical with release version, so separate items in the version picker are useless. Changes have been discussed with Michael Meeks.
Comment 6 Cheng-Chia Tseng 2011-07-29 04:10:22 UTC
It is not fixed yet in 3.4.2 RC3.

Andras, could you verify this bug again? Thanks.
Comment 7 Björn Michaelsen 2011-09-23 11:41:18 UTC
*** Bug 40766 has been marked as a duplicate of this bug. ***
Comment 8 Julien Nabet 2011-11-26 05:29:13 UTC
Is the bug still there on 3.4.4 ?
What's the font and encoding to use to test ?
Comment 9 Cheng-Chia Tseng 2011-11-28 02:34:23 UTC
It is still there.

I don't know actually the encoding is, but Chinese (Traditional) Windows usually use "big5" as default.

The problem only exist when you use the "Copy" item of the context menu to paste on Calc.

However, it you open Calc first, and drag the table in odb file directly to the Calc, and everything is fine. This is the expected output.
Comment 10 sasha.libreoffice 2011-12-21 05:56:31 UTC
Repruduced.
Windows XP 32 bit LibO 3.4 russian language
Problem is more interesting than expected. When I copied table from database and inserted it into Calc, it inserted without problem. Then I changed one field in database to text on russian (Cyrillic) , copied table and inserted to Calc. All except russian inserted ok, but russian text looks wrong.
To reproduce this looking of russian text manually, I have saved document with russian text as text in Windows encoding, then opened it in webbrowser and changed encoding to ISO-8859-1. Characters become looking as described above problem.
Therefore it is locale-specific problem.
This bug resembles Bug 39890
Comment 11 Björn Michaelsen 2011-12-23 12:06:41 UTC Comment hidden (obsolete)
Comment 12 Björn Michaelsen 2011-12-23 17:01:30 UTC Comment hidden (obsolete)
Comment 13 Cheng-Chia Tseng 2012-01-18 05:31:59 UTC
The bug is still there.

Using the mehtod in this blog post:
http://openoffice.blogs.com/openoffice/2007/04/farrrrrr_simple.html

Go to "Table" first, then right click the Data I would like to "Copy" to Calc.
This would create Data in wrong encoding.

Plus, drag the Data icon directly and drop to Calc, the problem can also be
avoided.
Comment 14 Urmas 2012-11-23 08:57:30 UTC
The data formats for both RTF and HTML contain certain text in a system default encoding, but it is either marked as a charset 0 or windows-1252.
Comment 15 sasha.libreoffice 2012-11-23 12:04:19 UTC
Possibly related to Bug 36144
Comment 16 Urmas 2012-11-23 19:57:32 UTC
The culprit probably is

/core/dbaccess/source/ui/misc/TokenWriter.cxx:421

Also, the logic here seems strange to me:

/core/svtools/source/svrtf/rtfout.cxx:118
Comment 17 Julien Nabet 2014-07-23 19:56:03 UTC
Any update with last LO stable version, 4.2.5?
Comment 18 Cheng-Chia Tseng 2014-08-30 08:30:02 UTC
Bug existed, reproducible on 4.3.0.
Comment 19 Julien Nabet 2014-08-30 08:36:51 UTC
Cheng-Chia: Thank you for your feedback, put it back to NEW.
Comment 20 Dimitris Xenakis 2014-10-08 11:18:06 UTC
I can confirm the issue, exactly as described elsewhere in this thread, except this time for Greek fonts. 
Libreoffice Version: 4.3.0.4
Build ID: 62ad5818884a2fc2e5780dd45466868d41009ec0 on Windows 7 Pro
Comment 21 Dimitris Xenakis 2014-10-08 11:50:36 UTC
Please see this Bug 79631 where it seems that Dominik has tackled the issue in version 4.4.0.0.alpha0+ . And apologies for having forgotten my own submission...
Comment 22 Julien Nabet 2014-10-09 21:28:34 UTC
On pc Debian x86-64 with 4.3.2 Debian package, I don't reproduce the very similar fdo#79631 put in See Also

Cheng-Chia: Since, I don't have Google account, could you give a new try with 4.3.2 version?
Comment 23 Cheng-Chia Tseng 2014-10-10 03:06:26 UTC
As reported, this bug only existed on Windows platform. Linux is not affected.

You can get the file at https://www.dropbox.com/s/tbb7bgffees5igj/%E5%B7%A5%E7%A8%8B%E7%AE%A1%E7%90%86%E8%B3%87%E6%96%99%E5%BA%AB2.odb?dl=0

Tested with version 4.3.2 on Windows 7 64bit, this long life bug exists still.
Comment 24 Dimitris Xenakis 2014-10-10 08:45:48 UTC
If one needs another file for testing, the attachment in bug 79631 does the trick. It is a small odb file containing a single table with fonts in various encodings exhibiting the issue. Here is the link: https://bugs.freedesktop.org/attachment.cgi?id=100392
Comment 25 Urmas 2014-10-10 16:25:49 UTC
The text in the 'system' encoding on Windows is copied to RTF as ANSI. It is marked by a font with \charset0. That is what causes the issue.
Comment 26 Urmas 2014-10-10 16:27:19 UTC
*** Bug 79631 has been marked as a duplicate of this bug. ***
Comment 27 Urmas 2014-10-10 16:53:26 UTC
If you want to reproduce it, you will need a Windows OS set to any legacy codepage than 1252.

P.S. Bugs do not fix themselves by magic after two years.
Comment 28 Julien Nabet 2014-10-13 18:13:33 UTC
Urmas: bug may indeed be "magically" fixed sometimes when:
- a similar bug has been fixed
- some code part has been redesigned
- a problem indicated by code analyzers (like coverity scan, cppcheck and other), which was the root cause of the bug, has been fixed
etc.
Of course, I wouldn't be able to give you any probalities but it does happen sometimes! :-)
Comment 29 Alex Thurgood 2015-01-03 17:41:13 UTC Comment hidden (no-value)
Comment 30 QA Administrators 2016-01-17 20:04:49 UTC Comment hidden (obsolete)
Comment 31 Urmas 2016-01-24 18:06:50 UTC
*** Bug 97346 has been marked as a duplicate of this bug. ***
Comment 32 Dimitris Xenakis 2016-01-25 09:45:09 UTC
Hello, the bug remains, using Libreoffice Version: 5.0.4.2, on Windows 7. When I copy a whole row or table using right-click(or menu) the text pasted (at least for Greek and Hebrew) is wrong. On the contrary, when drag and drop table from Base to Calc, or copy single cell, it is OK. Used again the file mentioned on Comment 24.
Maybe this piece of info helps: If one chooses to "paste special" in the case of bug showing the options are just RTF and HTML, in the case of no bug only Unformatted text.
Comment 33 QA Administrators 2017-03-06 14:31:29 UTC Comment hidden (obsolete)
Comment 34 Dimitris Xenakis 2017-03-06 15:29:35 UTC
the bug remains, using Libreoffice Version:  5.2.2.2, on Windows 7.

Using the file mentioned in comment 24 , again the -wrongly encoded- greek text is pasted as 

îåóêåðÜæù ôçí øõ÷ïöèüñá âäåëõãìßá

instead of 

ξεσκεπάζω την ψυχοφθόρα βδελυγμία

So, no change during 2016...
Comment 35 himajin100000 2017-05-29 06:29:10 UTC
I'm not an expert,but I wonder what would happen if we explicitly specify appropriate encoding as the fifth parameter to SfxFrameHTMLWriter::Out_DocInfo in OHTMLImportExport::WriteHeader rather than relying on its default parameter? 

https://github.com/LibreOffice/core/blob/39adbb9593c764429e9ed2176dde755809b3af0f/dbaccess/source/ui/misc/TokenWriter.cxx#L677
Comment 36 Julien Nabet 2017-05-30 22:01:37 UTC
Thank you Urmas and himajin100000, let's give a try with https://gerrit.libreoffice.org/#/c/38253/

Urmas: I know that's it's a cold case but if you have some time, could you be more explicit about svl part of https://bugs.documentfoundation.org/show_bug.cgi?id=37859#c16 ?
I suppose it concerns Out_Char function and most particularly this part:
    130                     //If we can't convert to the dest encoding, or if
    131                     //it's an uncommon multibyte sequence which most
    132                     //readers won't be able to handle correctly, then
    133                     //export as unicode
    134                     OUString sBuf(&c, 1);
    135                     OString sConverted;
    136                     sal_uInt32 nFlags =
    137                         RTL_UNICODETOTEXT_FLAGS_UNDEFINED_ERROR |
    138                         RTL_UNICODETOTEXT_FLAGS_INVALID_ERROR;
    139                     bool bWriteAsUnicode = !(sBuf.convertToString(&sConverted,
    140                                          eDestEnc, nFlags))
    141                                             || (RTL_TEXTENCODING_UTF8==eDestEnc); // #i43933# do not export UTF-8 chars in RTF;
    142                     if (bWriteAsUnicode)
    143                     {
    144                         (void)sBuf.convertToString(&sConverted,
    145                             eDestEnc, OUSTRING_TO_OSTRING_CVTFLAGS);
    146                     }
    147                     const sal_Int32 nLen = sConverted.getLength();

See http://opengrok.libreoffice.org/xref/core/svtools/source/svrtf/rtfout.cxx#130
If you confirm, I think it could be interesting to have a bugtracker about this specific part with a failing case.
Comment 37 Commit Notification 2017-06-07 14:15:39 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=39487b14956d883899311b6294f6f09ca2371366

tdf#37859: Odb data copied to Calc showed wrong encoding in Windows

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 38 Commit Notification 2017-06-07 19:55:53 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a485908af200fadd561af0a5011276613849e356&h=libreoffice-5-4

tdf#37859: Odb data copied to Calc showed wrong encoding in Windows

It will be available in 5.4.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 39 Commit Notification 2017-06-08 20:35:51 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3951d44110df55589ff80f5eab752817c2475c0d&h=libreoffice-5-3

tdf#37859: Odb data copied to Calc showed wrong encoding in Windows

It will be available in 5.3.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 40 Julien Nabet 2017-06-08 20:43:16 UTC
Let's put this one to FIXED.

Don't hesitate to reopen this tracker if it still fails with a build which includes the patch.
Comment 41 kyama 2017-06-10 09:46:02 UTC
Tested with the dev build.

Previously both paste as RTF and paste as HTML didn't encode the text correctly but now paste as HTML encodes correctly.
Still paste as RTF doesn't encode correctly.

Version: 5.5.0.0.alpha0+ (x64)
Build ID: 076ed447f694239d5c67adee528ea6e471d909ff
CPU threads: 8; OS: Windows 6.19; UI render: GL; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2017-06-10_01:17:34
Locale: ja-JP (ja_JP); Calc: CL
Comment 42 himajin100000 2017-06-13 13:07:22 UTC
TODO for me:

Check whether "Options"-"Load/Save"-"HTML Compatibility"-"Export"-"Character set" affects this behavior.
Comment 43 Buovjaga 2017-06-17 17:56:29 UTC
*** Bug 97364 has been marked as a duplicate of this bug. ***
Comment 44 Buovjaga 2017-06-17 17:58:11 UTC
*** Bug 97365 has been marked as a duplicate of this bug. ***