Created attachment 42010 [details]
Writer file with correct data
I am trying to paste data from 1C (http://www.1centerprise.com/ a major russian ERP system, which is used almost in every company in Russia and in xUSSR) to Calc - but instead of cyrillic symbols I see only a mess like:
Êîíòðàãåíò Ñóììà ïðîäàæè â EUR Ñóììà ïðîäàæè áåç ñêèäîê â EUR Êîëè÷åñòâî (â áàçîâûõ åäèíèöàõ) Ñóììà ñêèäêè â EUR % ñêèäêè
Íîìåíêëàòóðà, Áàçîâàÿ åäèíèöà èçìåðåíèÿ
But when I paste to writer (or MS Excel) everything is ok:
Контрагент Сумма продажи в EUR Сумма продажи без скидок в EUR Количество (в базовых единицах) Сумма скидки в EUR % скидки
Номенклатура, Базовая единица измерения
This is a very frustrating bug, which prevents to use LibreOffice with 1C... Noone was able to fix it in original OOo.
As far as I know this is connected with 1C which transfer data in Excel 95 format in clipboard dropping encoding information and OOo thinks it's Latin1.
thank you very much.
Created attachment 42011 [details]
Wrong calc file
I'll take it. My plate is getting bigger for 3.4 already, I'll see if I can manage this for that target.
Thank you very much, Kohei! You will be our hero! Have a nice weekend!
Ok. I visited their website, but their content is all in English. Can you give me a specific web page where I can copy and paste Russian content from?
Ok. I guess I mis-understood the problem.
The problem appears to be the C1 program copying the data to the clipboard in the Excel 95 format, and not providing the encoding information (as you rightly explained).
Alexandr, do you happen to have a Russian version of Excel 95 on your system? If so, could you test if copying from Excel 95 to LibO can reproduce this problem?
Not possible for 3.4. Best done by someone who can reproduce this.
(In reply to comment #6)
> Not possible for 3.4. Best done by someone who can reproduce this.
Text in Wrong calc file (Excel 95 file format) saved in CP1252 instead of CP1251.
If you resave it to Excel 97/2000/xp format *.xls it'd be readable.
But if you save it using old format your text would be like ????????????, ??????? ??????? ????????? and I really don't know what codepage to use to make it readable again.
You can take Russian text for tests from http://www.1c.ru/
Could you make a clipboard contents snapshot by using Папку Обмена (clipbrd.exe) and attach it here?
Created attachment 48799 [details]
Created attachment 49845 [details]
Dump of "biff5" part of the CLP file
Probably it would be easier to deal with XLS in question instead of CLP file.
This one is not opened properly in LibO Calc.
Created attachment 49846 [details]
Dump of html part of the CLP file
this html looks ok in browser, so if Calc preferred html over biff5, 1c clipboards would pasted correctly
I'll take another look at this for 3.5.
There is a similar bug filed for gnumeric some years ago:
File generated by "1C" attached to this one has "0xCC" ('Cyrillic') in the 'Font' record 'charset' field and as so opened correctly by LibO.
Clipboard file attached to fdo#33100 has "0x00" ('ANSI Latin') in the 'charset' field and as a result should not be expected to be opened correctly even by Excel on the system with locale different from Cyrillic.
IMHO it's better to keep support for biff5 format, as some weird but widely used applications still utilise it as an interchange format.
On the LibO side it would be great to have a UI asking user what charset to use and configuration options for encoding like:
'force [my] encoding', 'ask me', 'honor CODEPAGE, ask if missed' etc.
Thanks for the additional info. Looks like this is not our bug per se.
Given this, here is what I'd like to do.
The clipboard image tells me that this software provides four clipboard formats to paste from
* Unformatted text
* Unicode text
and in Excel, even though the default paste fails to decode properly (as Valek explained), you can still choose paste special and paste it as HTML. Then Excel will paste the data using proper encoding (or maybe C1 provides unicode text in HTML format, whichever the case).
In Calc, OTOH, when you select paste special, it for some reason provides only one choice, and that is "unknown format". I'd like to look into that and see if we can provide the four clipboard format types to choose from just like Excel does. That way if the default paste messes up the encoding, you could still paste it as HTML manually.
Similar bug filed in Ubuntu:
"lessons.xls" attached to that bug sets codepage to 1252 and charsets in FONT records are 0.
Fortunately fontnames are "Arial Cyr", so that was used to fix gnumeric.
It shouldn't be difficult to implement similar substitution in LibO.
(Look for 'gnm_font_override_codepage' here http://git.gnome.org/browse/gnumeric/tree/src/style.c)
Different bug (actually XL bug this time), which probably could be solved by playing with clipboard preferences/handling.
Would be nice if someone tested with modern LibO and filed a separate bug if needed.
(In reply to comment #14)
> The clipboard image tells me that this software provides four clipboard formats
> to paste from
> * Unformatted text
> * Unicode text
> * BIFF5
> * HTML
> In Calc, OTOH, when you select paste special, it for some reason provides only
> one choice, and that is "unknown format".
Ok. This is puzzling. I'm in
where the available clipboard formats are fetched from the system clipboard. And here, the *system* clipboard says there is only 2 formats available, and both are text (ascii and unicode). No HTML nor BIFF5.
And that's via Windows system calls!
Given this, there is little chance that we could really fix this.
Well, Windows provides another clipboard functions, different from the one we are using c.f.
This one is what we are currently using to get all available clipboard formats
Apparently the latter doesn't return all available formats.... No idea what the difference is.
The former one also has different semantics. I need to figure that out first in order to make use of it.
This is pretty much outside of my realm, and is no longer a bug in the spreadsheet code. It's an issue with our clipboard handling involving the native Windows API calls. I'll put it on hold for now.
I need some help from you. Could you tell us what format choice you'll get when you
1) have 1C running when you copy data from it to the clipboard,
2) while 1C is still running, do Edit -> Paste Special in LibreOffice to see what format choices are available there.
Since I can't install 1C on my machine, I can't do that myself over here. Thanks a lot for your help.
@ Alexandr, Valek
It still reproducible with 3.5.0 version?
You see - there is a new version of 1C:Enterprise 8.2 - and know there is everything ok copy paste to Calc, it seems they fixed their export.
The transition from 8.1 to 8.2 is automatic, so I can say - the problem is solved.
Please change the status to correct (I don't know which one will be correct in this situation)
Due to last comment, change status to WorksForMe
If problem appears again, please, change status to Reopened
And even the attachment 49845 [details] (BIFF5) is now parsed correctly *according to default document language* after fixing tdf#132796.