Bug 32219 - KDE4 filepicker saves documents as Word6/Excel5 format even if MS Office 97/2000/XP is selected
Summary: KDE4 filepicker saves documents as Word6/Excel5 format even if MS Office 97/2...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.3.0 RC1
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Luboš Luňák
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-08 02:00 UTC by Gökçen Eraslan
Modified: 2011-10-31 06:51 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Document containing Turkish alphabet characters in ODS format, which is rendered fine (8.12 KB, application/vnd.oasis.opendocument.spreadsheet)
2010-12-08 02:00 UTC, Gökçen Eraslan
Details
Document containing Turkish alphabet characters in ODT format, which is also rendered fine (8.51 KB, application/vnd.oasis.opendocument.text)
2010-12-08 02:01 UTC, Gökçen Eraslan
Details
Document containing Turkish alphabet characters in XLS format, which is rendered incorrectly (5.50 KB, application/vnd.ms-excel)
2010-12-08 02:01 UTC, Gökçen Eraslan
Details
Document containing Turkish alphabet characters in DOC format, which is rendered incorrectly (6.50 KB, application/msword)
2010-12-08 02:02 UTC, Gökçen Eraslan
Details
A screenshot describing the problem (39.98 KB, image/png)
2010-12-12 03:30 UTC, Gökçen Eraslan
Details
A screenshot describing the problem (133.21 KB, image/png)
2010-12-12 08:11 UTC, Gökçen Eraslan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gökçen Eraslan 2010-12-08 02:00:36 UTC
Created attachment 40905 [details]
Document containing Turkish alphabet characters in ODS format, which is rendered fine

When I save a ODT or ODS documents with some Turkish characters (like ş, ı or ğ) in it, file is saved correctly and when load files again I see that all letters are fine.

But when I try to save same documents as MS Office 97 or 95 formats (like xls or doc) and open them all Turkish-only characters are displayed in '?' symbols.

Steps to reproduce: Open attached ods/odt file, and save them in xls/doc format and open again, see if all characters are rendered well.
Comment 1 Gökçen Eraslan 2010-12-08 02:01:22 UTC
Created attachment 40906 [details]
Document containing Turkish alphabet characters in ODT format, which is also rendered fine
Comment 2 Gökçen Eraslan 2010-12-08 02:01:56 UTC
Created attachment 40907 [details]
Document containing Turkish alphabet characters in XLS format, which is rendered incorrectly
Comment 3 Gökçen Eraslan 2010-12-08 02:02:21 UTC
Created attachment 40908 [details]
Document containing Turkish alphabet characters in DOC format, which is rendered incorrectly
Comment 4 Netanel_H 2010-12-11 11:37:41 UTC
May be realeted to - 
http://www.openoffice.org/issues/show_bug.cgi?id=114482 - Creating new Hebrew document in Word 95 format produces corrupted file.
Comment 5 Gökçen Eraslan 2010-12-12 03:30:40 UTC
Created attachment 41031 [details]
A screenshot describing the problem

I'm attaching a screenshot describing the problem.
Comment 6 Andras Timar 2010-12-12 04:35:07 UTC
Your sample documents are in Excel 5 and Word 6 format. These old, deprecated formats are not supported very well. Try to use Word 97 and Excel 97 formats, those have Unicode support and should work well.

For the Excel bug see http://qa.openoffice.org/issues/show_bug.cgi?id=32785
It says "nobody has implemented the CODEPAGE record in BIFF5", therefore only Latin1 (windows-1252) code page is fully supported in Excel 95 format.
Comment 7 Gökçen Eraslan 2010-12-12 04:47:31 UTC
(In reply to comment #6)
> Your sample documents are in Excel 5 and Word 6 format. 

No, they shouldn't be, if so then this is another bug since I'm sure I've selected Word/Excel 97 format while saving these files.

>These old, deprecated
> formats are not supported very well. Try to use Word 97 and Excel 97 formats,
> those have Unicode support and should work well.
> 
> For the Excel bug see http://qa.openoffice.org/issues/show_bug.cgi?id=32785
> It says "nobody has implemented the CODEPAGE record in BIFF5", therefore only
> Latin1 (windows-1252) code page is fully supported in Excel 95 format.

Anyway, can you please try what I said about reproducing the bug? Just open the ODT file and save it in Word 97 format and the open it again. If you see characters just like in ODT file, I'll close this as INVALID.
Comment 8 Caolán McNamara 2010-12-12 05:40:32 UTC
That output .doc is undeniably word 95. We should enhance getScriptClass in sw/source/filter/ww8/writerwordglue.cxx to handle better splitting up the UnicodeScript_kLatinExtendedA and UnicodeScript_kLatinExtendedB ranges into the microsoft encoding that support the various parts of that.

This isn't new however, its always been thus for word 95.

So, while I can improve the word 95 output at least, the question is more how come it is word 95 format in the first place.

Opening up the .odt and (under gnome) using file->save and typing in /tmp/turkish.doc gives a word 97 file by default which does work.

So trivial as it sounds, how exactly did you save this, what are the exact steps.
Comment 9 Gökçen Eraslan 2010-12-12 06:02:03 UTC
(In reply to comment #8)
...
> Opening up the .odt and (under gnome) using file->save and typing in
> /tmp/turkish.doc gives a word 97 file by default which does work.
> 
> So trivial as it sounds, how exactly did you save this, what are the exact
> steps.

I think I've found the issue, it's a bug in KDE4 file dialog: 

When I follow File -> Save As path and select Microsoft Word 97/2000/XP filter and click OK, the question asked is: 

This document may contain formatting or content that cannot be saved in the Microsoft Word 6.0 file format. Do you want to save the document in this format anyway?

and since file is saved as Word 6 format, there are encoding problems.

I think anybody used KDE4 interface can reproduce this bug, when I run oowriter with OOO_FORCE_DESKTOP=gnome, I can save the file and characters are rendered fine.
Comment 10 Gökçen Eraslan 2010-12-12 06:07:10 UTC
(In reply to comment #9)
> (In reply to comment #8)
> ...
> > Opening up the .odt and (under gnome) using file->save and typing in
> > /tmp/turkish.doc gives a word 97 file by default which does work.
> > 
> > So trivial as it sounds, how exactly did you save this, what are the exact
> > steps.
> 
> I think I've found the issue, it's a bug in KDE4 file dialog: 

By the way I'm using KDE 4.5.3, this may be related to the KDE version.
Comment 11 Caolán McNamara 2010-12-12 06:14:18 UTC
If we take the real problem as wrong filter selected, then indeed this sounds like a KDE specific issue. Putting it back into the pool then, as I don't have KDE libs etc installed, best if someone responsible for that area has a look. Though we should still improve the encoding export stuff :-)
Comment 12 Gökçen Eraslan 2010-12-12 08:11:29 UTC
Created attachment 41043 [details]
A screenshot describing the problem
Comment 13 Luboš Luňák 2010-12-15 06:46:24 UTC
Fixed for 3.3.