Bug 33100 - Cyrillic symbols are not pasting correctly into CALC from 1C:Predpriyatie
Summary: Cyrillic symbols are not pasting correctly into CALC from 1C:Predpriyatie
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Kohei Yoshida
URL:
Whiteboard: target:3.5
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-14 03:01 UTC by Alexandr
Modified: 2021-09-03 07:40 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Writer file with correct data (10.04 KB, application/vnd.oasis.opendocument.text)
2011-01-14 03:01 UTC, Alexandr
Details
Wrong calc file (10.54 KB, application/vnd.oasis.opendocument.spreadsheet)
2011-01-14 03:01 UTC, Alexandr
Details
Clipboard file (12.33 KB, application/octet-stream)
2011-07-06 00:00 UTC, Alexandr
Details
Dump of "biff5" part of the CLP file (4.50 KB, application/octet-stream)
2011-08-02 12:16 UTC, Valek Filippov
Details
Dump of html part of the CLP file (3.24 KB, text/html)
2011-08-02 12:21 UTC, Valek Filippov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandr 2011-01-14 03:01:16 UTC
Created attachment 42010 [details]
Writer file with correct data

I am trying to paste data from 1C (http://www.1centerprise.com/ a major russian ERP system, which is used almost in every company in Russia and in xUSSR) to Calc - but instead of cyrillic symbols I see only a mess like:

Êîíòðàãåíò	Ñóììà ïðîäàæè â EUR	Ñóììà ïðîäàæè áåç ñêèäîê â EUR	Êîëè÷åñòâî (â áàçîâûõ åäèíèöàõ)	Ñóììà ñêèäêè â EUR	% ñêèäêè
Íîìåíêëàòóðà, Áàçîâàÿ åäèíèöà èçìåðåíèÿ		

But when I paste to writer (or MS Excel) everything is ok: 
Контрагент	Сумма продажи в EUR	Сумма продажи без скидок в EUR	Количество (в базовых единицах)	Сумма скидки в EUR	% скидки
Номенклатура, Базовая единица измерения	

This is a very frustrating bug, which prevents to use LibreOffice with 1C... Noone was able to fix it in original OOo. 

As far as I know this is connected with 1C which transfer data in Excel 95 format in clipboard dropping encoding information and OOo thinks it's Latin1. 

thank you very much.
Comment 1 Alexandr 2011-01-14 03:01:46 UTC
Created attachment 42011 [details]
Wrong calc file
Comment 2 Kohei Yoshida 2011-01-14 07:33:19 UTC
I'll take it.  My plate is getting bigger for 3.4 already, I'll see if I can manage this for that target.
Comment 3 Alexandr 2011-01-14 12:28:19 UTC
Thank you very much, Kohei! You will be our hero! Have a nice weekend!
Comment 4 Kohei Yoshida 2011-04-15 10:59:46 UTC
Ok.  I visited their website, but their content is all in English.  Can you give me a specific web page where I can copy and paste Russian content from?
Comment 5 Kohei Yoshida 2011-04-19 12:45:54 UTC
Ok.  I guess I mis-understood the problem.

The problem appears to be the C1 program copying the data to the clipboard in the Excel 95 format, and not providing the encoding information (as you rightly explained).

Alexandr, do you happen to have a Russian version of Excel 95 on your system?  If so, could you test if copying from Excel 95 to LibO can reproduce this problem?

Thanks!
Comment 6 Kohei Yoshida 2011-04-26 08:37:19 UTC
Not possible for 3.4. Best done by someone who can reproduce this.
Comment 7 Alex Kurd 2011-06-13 06:43:02 UTC
(In reply to comment #6)
> Not possible for 3.4. Best done by someone who can reproduce this.

Text in Wrong calc file (Excel 95 file format) saved in CP1252 instead of CP1251.
If you resave it to Excel 97/2000/xp format *.xls it'd be readable.
But if you save it using old format your text would be like ????????????, ??????? ??????? ????????? and I really don't know what codepage to use to make it readable again.

You can take Russian text for tests from http://www.1c.ru/
Comment 8 Urmas 2011-07-02 01:23:56 UTC
Could you make a clipboard contents snapshot by using Папку Обмена (clipbrd.exe) and attach it here?
Comment 9 Alexandr 2011-07-06 00:00:53 UTC
Created attachment 48799 [details]
Clipboard file
Comment 10 Valek Filippov 2011-08-02 12:16:09 UTC
Created attachment 49845 [details]
Dump of "biff5" part of the CLP file

Probably it would be easier to deal with XLS in question instead of CLP file.
This one is not opened properly in LibO Calc.
Comment 11 Valek Filippov 2011-08-02 12:21:58 UTC
Created attachment 49846 [details]
Dump of html part of the CLP file

this html looks ok in browser, so if Calc preferred html over biff5, 1c clipboards would pasted correctly
Comment 12 Kohei Yoshida 2011-08-02 14:01:42 UTC
I'll take another look at this for 3.5.
Comment 13 Valek Filippov 2011-08-03 08:27:58 UTC
There is a similar bug filed for gnumeric some years ago:
https://bugzilla.gnome.org/show_bug.cgi?id=304007

File generated by "1C" attached to this one has "0xCC" ('Cyrillic') in the 'Font' record 'charset' field and as so opened correctly by LibO.
Clipboard file attached to fdo#33100 has "0x00" ('ANSI Latin') in the 'charset' field and as a result should not be expected to be opened correctly even by Excel on the system with locale different from Cyrillic.

IMHO it's better to keep support for biff5 format, as some weird but widely used applications still utilise it as an interchange format.
On the LibO side it would be great to have a UI asking user what charset to use and configuration options for encoding like:
'force [my] encoding', 'ask me', 'honor CODEPAGE, ask if missed' etc.
Comment 14 Kohei Yoshida 2011-08-03 13:34:47 UTC
Thanks for the additional info.  Looks like this is not our bug per se.

Given this, here is what I'd like to do.

The clipboard image tells me that this software provides four clipboard formats to paste from

* Unformatted text
* Unicode text
* BIFF5
* HTML

and in Excel, even though the default paste fails to decode properly (as Valek explained), you can still choose paste special and paste it as HTML.  Then Excel will paste the data using proper encoding (or maybe C1 provides unicode text in HTML format, whichever the case).

In Calc, OTOH, when you select paste special, it for some reason provides only one choice, and that is "unknown format".  I'd like to look into that and see if we can provide the four clipboard format types to choose from just like Excel does.  That way if the default paste messes up the encoding, you could still paste it as HTML manually.
Comment 15 Valek Filippov 2011-08-12 14:36:57 UTC
Similar bug filed in Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/gnumeric/+bug/262777/

"lessons.xls" attached to that bug sets codepage to 1252 and charsets in FONT records are 0.
Fortunately fontnames are "Arial Cyr", so that was used to fix gnumeric.
It shouldn't be difficult to implement similar substitution in LibO.
(Look for 'gnm_font_override_codepage' here http://git.gnome.org/browse/gnumeric/tree/src/style.c)
Comment 16 Valek Filippov 2011-08-22 07:49:46 UTC
Different bug (actually XL bug this time), which probably could be solved by playing with clipboard preferences/handling.

https://bugzilla.gnome.org/show_bug.cgi?id=651260#c20

Would be nice if someone tested with modern LibO and filed a separate bug if needed.
Comment 17 Kohei Yoshida 2011-12-01 12:53:45 UTC
(In reply to comment #14)

> The clipboard image tells me that this software provides four clipboard formats
> to paste from
> 
> * Unformatted text
> * Unicode text
> * BIFF5
> * HTML
...
> In Calc, OTOH, when you select paste special, it for some reason provides only
> one choice, and that is "unknown format".  

Ok. This is puzzling.  I'm in 

dtrans/source/win32/dtobj/DOTransferable.cxx
CDOTransferable::initFlavorList()

where the available clipboard formats are fetched from the system clipboard.  And here, the *system* clipboard says there is only 2 formats available, and both are text (ascii and unicode).  No HTML nor BIFF5.

And that's via Windows system calls!

Given this, there is little chance that we could really fix this.
Comment 18 Kohei Yoshida 2011-12-01 13:41:12 UTC
Well, Windows provides another clipboard functions, different from the one we are using c.f. 

http://msdn.microsoft.com/en-us/library/windows/desktop/ms649038%28v=vs.85%29.aspx

This one is what we are currently using to get all available clipboard formats

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683979%28v=vs.85%29.aspx

Apparently the latter doesn't return all available formats.... No idea what the difference is.

The former one also has different semantics.  I need to figure that out first in order to make use of it.
Comment 19 Kohei Yoshida 2011-12-02 08:38:46 UTC
This is pretty much outside of my realm, and is no longer a bug in the spreadsheet code.  It's an issue with our clipboard handling involving the native Windows API calls.  I'll put it on hold for now.
Comment 20 Kohei Yoshida 2011-12-05 18:55:00 UTC
Alexandr,

I need some help from you.  Could you tell us what format choice you'll get when you

1) have 1C running when you copy data from it to the clipboard,
2) while 1C is still running, do Edit -> Paste Special in LibreOffice to see what format choices are available there.

Since I can't install 1C on my machine, I can't do that myself over here.  Thanks a lot for your help.
Comment 21 sasha.libreoffice 2012-02-23 03:02:15 UTC
@ Alexandr, Valek
It still reproducible with 3.5.0 version?
Comment 22 Alexandr 2012-02-23 03:22:21 UTC
Hi!
You see - there is a new version of 1C:Enterprise 8.2 - and know there is everything ok copy paste to Calc, it seems they fixed their export. 

The transition from 8.1 to 8.2 is automatic, so I can say - the problem is solved.
Comment 23 Alexandr 2012-02-23 03:23:56 UTC
Please change the status to correct (I don't know which one will be correct in this situation)
Comment 24 sasha.libreoffice 2012-02-23 03:31:56 UTC
Due to last comment, change status to WorksForMe
If problem appears again, please, change status to Reopened
Comment 25 Mike Kaganski 2021-09-03 07:40:39 UTC
And even the attachment 49845 [details] (BIFF5) is now parsed correctly *according to default document language* after fixing tdf#132796.