It is a very old bug, openoffice load Microsoft XLS file by ANSII encode whether any encode file saves. We need can to choice the encode, when openoffice load XLS file.
Libreoffice is a fork of OpenOffice, so there are 2 different softwares. Which one do you use?
Do you mean ASCII? Please attach an example file.
The old style (95? 2000?) XLS is not include an encode chr filehead. user need the software to choice encode by check content or choice the encode manually when open file. many office software (include libreoffice) all have the bug. https://drive.google.com/file/d/0B24u7bpWDlL0NlRpa19XNjg3MFk/view?usp=sharing
Created attachment 118927 [details] Test kit Moved to here from Google drive
Ok, I confirm the the old (and common) style excel file.xls does not display the Chinese characters. I don't know what this is about, but I'll set to NEW anyway. I'll try to find someone, who understands. Win 7 Pro 64-bit, Version: 5.0.1.2 (32-bit) Build ID: 81898c9f5c0d43f3473ba111d7b351050be20261 Locale: fi-FI (fi_FI) Version: 5.1.0.0.alpha1+ (x64) Build ID: 9ce08dcc2e32c5554ddf71b79173f8854e0568ad TinderBox: Win-x86_64@62-TDF, Branch:MASTER, Time: 2015-09-17_21:43:51 Locale: en-US (fi_FI)
With this change, it seems to work: diff --git a/sc/source/filter/excel/excel.cxx b/sc/source/filter/excel/excel.cxx index b21ca5f..249fb62 100644 --- a/sc/source/filter/excel/excel.cxx +++ b/sc/source/filter/excel/excel.cxx @@ -120,7 +120,7 @@ FltError ScFormatFilterPluginImpl::ScImportExcel( SfxMedium& rMedium, ScDocument { pBookStrm->SetBufferSize( 0x8000 ); // still needed? - XclImpRootData aImpData( eBiff, rMedium, xRootStrg, *pDocument, RTL_TEXTENCODING_MS_1252 ); + XclImpRootData aImpData( eBiff, rMedium, xRootStrg, *pDocument, RTL_TEXTENCODING_MS_936); std::unique_ptr< ImportExcel > xFilter; switch( eBiff ) { Of course, the change is wrong but it seemed to me a good start to search. I added some traces in ImportExcel::Read() in sc/source/filter/excel/read.cxx and had only these: - case Z_Biff5TPre, nOpcode: 516 (0x0204 = EXC_ID3_LABEL) - case Z_Biff5TPre, nOpcode: 10 (0x000A) - case Z_Biff5T, nOpcode: 516 - case Z_Biff5T, nOpcode: 10 So indeed, if we don't enter "case 0x42: Codepage(); break;", I don't know how we can retrieve the right codepage/encoding I tried to take a look to xls specs https://msdn.microsoft.com/en-us/library/office/cc313154%28v=office.12%29.aspx. No hints :-( Of course, there may be some code we may call to have dialog box proposing encoding but, if Excel5 can display correctly the file, certainly the encoding info are stored somewhere. Eike: thought you might be interested/have some idea
Old Excel versions did not store the text encoding, they assumed the encoding to be the same code page the operating system runs in, so there is actually no way for us to obtain an encoding from the file in these cases. Other than guess from content or let the user choose. But we don't know in advance unless we have not encountered the code page record before the actual content, and calling a dialog from within the filter code is about a no-go. Given the legacy and rare occurrence of such documents I'm not convinced that implementing a heavy workaround would be worth the effort. Btw, for us people working on the shell command line it would be helpful to not have spaces or parentheses in file names if several files start with the same prefix where file name completion stops at a space or parenthesis ... @Julien: When analyzing .xls files you might be interested in the mso-dumper from http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper but that seems to hang with "the\ old\ \(and\ common\)\ style\ excel\ file.xls", probably because it is not a compound document file format. Patches on gerrit welcome ;) if that plain BIFF5 thingy fits at all..
(In reply to Eike Rathke from comment #7) ... > @Julien: > When analyzing .xls files you might be interested in the mso-dumper from > http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper > but that seems to hang with "the\ old\ \(and\ common\)\ style\ excel\ > file.xls", probably because it is not a compound document file format. > Patches on gerrit welcome ;) if that plain BIFF5 thingy fits at all.. I wanted to give it a look so I downloaded it from github (it didn't work from cgit.freedesktop.org) and had this after having launch "make" cd test/doc && ./test.py ...................... ---------------------------------------------------------------------- Ran 22 tests in 4.828s OK cd test/emf && ./test.py . ---------------------------------------------------------------------- Ran 1 test in 0.047s OK pep8 --ignore=E501 msodumper/msometa.py pep8 --ignore=E501 doc-dump.py msodumper/doc{dirstream,record,sprm,stream}.py test/doc/test.py doc-dump.py:12:1: E402 module level import not at top of file msodumper/doc{dirstream,record,sprm,stream}.py:1:1: E902 IOError: [Errno 2] No such file or directory: 'msodumper/doc{dirstream,record,sprm,stream}.py' test/doc/test.py:12:1: E402 module level import not at top of file test/doc/test.py:13:1: E402 module level import not at top of file test/doc/test.py:14:1: E402 module level import not at top of file Makefile:2: recipe for target 'check' failed It seems quite a different world from LO :-)
*** This bug has been marked as a duplicate of bug 35208 ***
*** This bug has been marked as a duplicate of bug 132796 ***