Bug 94382 - Load XLS files by ANSII encode.
Summary: Load XLS files by ANSII encode.
Status: RESOLVED DUPLICATE of bug 132796
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-20 16:14 UTC by leaies
Modified: 2020-12-30 03:39 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Test kit (1.75 MB, application/zip)
2015-09-22 10:21 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description leaies 2015-09-20 16:14:08 UTC
It is a very old bug, openoffice load Microsoft XLS file by ANSII encode whether any encode file saves. We need can to choice the encode, when openoffice load XLS file.
Comment 1 Julien Nabet 2015-09-20 17:03:41 UTC
Libreoffice is a fork of OpenOffice, so there are 2 different softwares. Which one do you use?
Comment 2 Buovjaga 2015-09-20 17:47:36 UTC
Do you mean ASCII?
Please attach an example file.
Comment 3 leaies 2015-09-22 06:00:45 UTC
The old style (95? 2000?) XLS is not include an encode chr filehead. user need the software to choice encode by check content or choice the encode manually when open file.

many office software (include libreoffice) all have the bug.

https://drive.google.com/file/d/0B24u7bpWDlL0NlRpa19XNjg3MFk/view?usp=sharing
Comment 4 Buovjaga 2015-09-22 10:21:31 UTC
Created attachment 118927 [details]
Test kit

Moved to here from Google drive
Comment 5 Buovjaga 2015-09-22 10:27:32 UTC
Ok, I confirm the the old (and common) style excel file.xls does not display the Chinese characters.
I don't know what this is about, but I'll set to NEW anyway.

I'll try to find someone, who understands.

Win 7 Pro 64-bit, Version: 5.0.1.2 (32-bit)
Build ID: 81898c9f5c0d43f3473ba111d7b351050be20261
Locale: fi-FI (fi_FI)

Version: 5.1.0.0.alpha1+ (x64)
Build ID: 9ce08dcc2e32c5554ddf71b79173f8854e0568ad
TinderBox: Win-x86_64@62-TDF, Branch:MASTER, Time: 2015-09-17_21:43:51
Locale: en-US (fi_FI)
Comment 6 Julien Nabet 2015-10-02 22:36:19 UTC
With this change, it seems to work:
diff --git a/sc/source/filter/excel/excel.cxx b/sc/source/filter/excel/excel.cxx
index b21ca5f..249fb62 100644
--- a/sc/source/filter/excel/excel.cxx
+++ b/sc/source/filter/excel/excel.cxx
@@ -120,7 +120,7 @@ FltError ScFormatFilterPluginImpl::ScImportExcel( SfxMedium& rMedium, ScDocument
     {
         pBookStrm->SetBufferSize( 0x8000 );     // still needed?
 
-        XclImpRootData aImpData( eBiff, rMedium, xRootStrg, *pDocument, RTL_TEXTENCODING_MS_1252 );
+        XclImpRootData aImpData( eBiff, rMedium, xRootStrg, *pDocument, RTL_TEXTENCODING_MS_936);
         std::unique_ptr< ImportExcel > xFilter;
         switch( eBiff )
         {

Of course, the change is wrong but it seemed to me a good start to search.

I added some traces in ImportExcel::Read() in sc/source/filter/excel/read.cxx and had only these:
- case Z_Biff5TPre, nOpcode: 516 (0x0204 = EXC_ID3_LABEL)
- case Z_Biff5TPre, nOpcode: 10 (0x000A)
- case Z_Biff5T, nOpcode: 516
- case Z_Biff5T, nOpcode: 10

So indeed, if we don't enter "case 0x42:  Codepage(); break;", I don't know how we can retrieve the right codepage/encoding

I tried to take a look to xls specs https://msdn.microsoft.com/en-us/library/office/cc313154%28v=office.12%29.aspx. No hints :-(

Of course, there may be some code we may call to have dialog box proposing encoding but, if Excel5 can display correctly the file, certainly the encoding info are stored somewhere.

Eike: thought you might be interested/have some idea
Comment 7 Eike Rathke 2015-10-06 10:47:46 UTC
Old Excel versions did not store the text encoding, they assumed the encoding to be the same code page the operating system runs in, so there is actually no way for us to obtain an encoding from the file in these cases. Other than guess from content or let the user choose. But we don't know in advance unless we have not encountered the code page record before the actual content, and calling a dialog from within the filter code is about a no-go. Given the legacy and rare occurrence of such documents I'm not convinced that implementing a heavy workaround would be worth the effort.

Btw, for us people working on the shell command line it would be helpful to not have spaces or parentheses in file names if several files start with the same prefix where file name completion stops at a space or parenthesis ...

@Julien:
When analyzing .xls files you might be interested in the mso-dumper from http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper
but that seems to hang with "the\ old\ \(and\ common\)\ style\ excel\ file.xls", probably because it is not a compound document file format. Patches on gerrit welcome ;) if that plain BIFF5 thingy fits at all..
Comment 8 Julien Nabet 2015-10-07 21:02:22 UTC
(In reply to Eike Rathke from comment #7)
...
> @Julien:
> When analyzing .xls files you might be interested in the mso-dumper from
> http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper
> but that seems to hang with "the\ old\ \(and\ common\)\ style\ excel\
> file.xls", probably because it is not a compound document file format.
> Patches on gerrit welcome ;) if that plain BIFF5 thingy fits at all..

I wanted to give it a look so I downloaded it from github (it didn't work from cgit.freedesktop.org) and had this after having launch "make"
cd test/doc && ./test.py
......................
----------------------------------------------------------------------
Ran 22 tests in 4.828s

OK
cd test/emf && ./test.py
.
----------------------------------------------------------------------
Ran 1 test in 0.047s

OK
pep8 --ignore=E501 msodumper/msometa.py
pep8 --ignore=E501 doc-dump.py msodumper/doc{dirstream,record,sprm,stream}.py test/doc/test.py
doc-dump.py:12:1: E402 module level import not at top of file
msodumper/doc{dirstream,record,sprm,stream}.py:1:1: E902 IOError: [Errno 2] No such file or directory: 'msodumper/doc{dirstream,record,sprm,stream}.py'
test/doc/test.py:12:1: E402 module level import not at top of file
test/doc/test.py:13:1: E402 module level import not at top of file
test/doc/test.py:14:1: E402 module level import not at top of file
Makefile:2: recipe for target 'check' failed

It seems quite a different world from LO :-)
Comment 9 Maxim Monastirsky 2015-12-09 22:20:04 UTC

*** This bug has been marked as a duplicate of bug 35208 ***
Comment 10 Kevin Suo 2020-12-30 03:39:48 UTC

*** This bug has been marked as a duplicate of bug 132796 ***