Created attachment 40905 [details] Document containing Turkish alphabet characters in ODS format, which is rendered fine When I save a ODT or ODS documents with some Turkish characters (like ş, ı or ğ) in it, file is saved correctly and when load files again I see that all letters are fine. But when I try to save same documents as MS Office 97 or 95 formats (like xls or doc) and open them all Turkish-only characters are displayed in '?' symbols. Steps to reproduce: Open attached ods/odt file, and save them in xls/doc format and open again, see if all characters are rendered well.
Created attachment 40906 [details] Document containing Turkish alphabet characters in ODT format, which is also rendered fine
Created attachment 40907 [details] Document containing Turkish alphabet characters in XLS format, which is rendered incorrectly
Created attachment 40908 [details] Document containing Turkish alphabet characters in DOC format, which is rendered incorrectly
May be realeted to - http://www.openoffice.org/issues/show_bug.cgi?id=114482 - Creating new Hebrew document in Word 95 format produces corrupted file.
Created attachment 41031 [details] A screenshot describing the problem I'm attaching a screenshot describing the problem.
Your sample documents are in Excel 5 and Word 6 format. These old, deprecated formats are not supported very well. Try to use Word 97 and Excel 97 formats, those have Unicode support and should work well. For the Excel bug see http://qa.openoffice.org/issues/show_bug.cgi?id=32785 It says "nobody has implemented the CODEPAGE record in BIFF5", therefore only Latin1 (windows-1252) code page is fully supported in Excel 95 format.
(In reply to comment #6) > Your sample documents are in Excel 5 and Word 6 format. No, they shouldn't be, if so then this is another bug since I'm sure I've selected Word/Excel 97 format while saving these files. >These old, deprecated > formats are not supported very well. Try to use Word 97 and Excel 97 formats, > those have Unicode support and should work well. > > For the Excel bug see http://qa.openoffice.org/issues/show_bug.cgi?id=32785 > It says "nobody has implemented the CODEPAGE record in BIFF5", therefore only > Latin1 (windows-1252) code page is fully supported in Excel 95 format. Anyway, can you please try what I said about reproducing the bug? Just open the ODT file and save it in Word 97 format and the open it again. If you see characters just like in ODT file, I'll close this as INVALID.
That output .doc is undeniably word 95. We should enhance getScriptClass in sw/source/filter/ww8/writerwordglue.cxx to handle better splitting up the UnicodeScript_kLatinExtendedA and UnicodeScript_kLatinExtendedB ranges into the microsoft encoding that support the various parts of that. This isn't new however, its always been thus for word 95. So, while I can improve the word 95 output at least, the question is more how come it is word 95 format in the first place. Opening up the .odt and (under gnome) using file->save and typing in /tmp/turkish.doc gives a word 97 file by default which does work. So trivial as it sounds, how exactly did you save this, what are the exact steps.
(In reply to comment #8) ... > Opening up the .odt and (under gnome) using file->save and typing in > /tmp/turkish.doc gives a word 97 file by default which does work. > > So trivial as it sounds, how exactly did you save this, what are the exact > steps. I think I've found the issue, it's a bug in KDE4 file dialog: When I follow File -> Save As path and select Microsoft Word 97/2000/XP filter and click OK, the question asked is: This document may contain formatting or content that cannot be saved in the Microsoft Word 6.0 file format. Do you want to save the document in this format anyway? and since file is saved as Word 6 format, there are encoding problems. I think anybody used KDE4 interface can reproduce this bug, when I run oowriter with OOO_FORCE_DESKTOP=gnome, I can save the file and characters are rendered fine.
(In reply to comment #9) > (In reply to comment #8) > ... > > Opening up the .odt and (under gnome) using file->save and typing in > > /tmp/turkish.doc gives a word 97 file by default which does work. > > > > So trivial as it sounds, how exactly did you save this, what are the exact > > steps. > > I think I've found the issue, it's a bug in KDE4 file dialog: By the way I'm using KDE 4.5.3, this may be related to the KDE version.
If we take the real problem as wrong filter selected, then indeed this sounds like a KDE specific issue. Putting it back into the pool then, as I don't have KDE libs etc installed, best if someone responsible for that area has a look. Though we should still improve the encoding export stuff :-)
Created attachment 41043 [details] A screenshot describing the problem
Fixed for 3.3.