Created attachment 54885 [details] RTF document without code page Some RTF generators generate documents without defining ANSI code page. WordPad, Word Viewer opens such documents in default code page that is configured in Windows' regional settings (Language for non-Unicode programs). LibreOffice opens them in some other code page.
Created attachment 54886 [details] How it looks in LibreOffice, WordPad when "Language for non-Unicode programs"=Lithuanian
Created attachment 54887 [details] ASCII filter options One possible "fix" (fixing it in Windows, Linux) in this case would be to offer "RTF Filter Options" dialog to select a correct character set like in "ASCII filter options" dialog.
just using regional settings won't help for linux where nowadays the charset is utf-8 - so a filter-dialog probably is the better choice.
Hi Miklos, Are you aware of this problem ? Please feel free to reassign, if you cannot handle this bug. Best regards. JBF
(In reply to comment #0) > Created attachment 54885 [details] > LibreOffice opens them in some other code page. I can confirm this bug. I'm using slovenian regional settings in Windows. There's no problem with Open Office 3.3 and I think everything was OK in older versions of LibreOffice. In LibreOffice 3.5 some RTF files are opened in Asian or some other code page.
@Aurimas Fišeras: If I open the RTF file with WordPad under Ubuntu 11.10 with Wine, I get the same thing than you (and me) with LibreOffice 3.5. Best regards. JBF
*** Bug 48023 has been marked as a duplicate of this bug. ***
Problem exists for Turkish RTF characters either. Openoffice & MSOffice is fine.
(In reply to comment #0) > Some RTF generators generate documents without defining ANSI code page. Looks like setting ANSI code page alone cannot fix the file (see Bug 48446). LO ignores this information anyway. Only /fcharsetN is honored. Also, MS own products seem to fail to follow their own specifications: if you define /ansicpgN, MS Word 2010, MS Office WordViewer v.11, as well as MS WordPad in W7 all ignore it, and use the Language for non-Unicode programs. If you set /cpgN, both MS Word 2010 and MS Office WordViewer v.11 ignore it, while WordPad takes it into account.
(In reply to comment #6) > @Aurimas Fišeras: If I open the RTF file with WordPad under Ubuntu 11.10 with > Wine, I get the same thing than you (and me) with LibreOffice 3.5. What is the "Language for non-Unicode programs" setting under your Wine? ;) You will have the same result under any Windows where this setting is not set to the language for which this file was created.
this comment is just a copy from bug 48023: In Writer-Options->Language Settings->Languages I have 'Locale setting'=>Russian and Western=>Russian; once I replace Russian with any other language and reopen rtf-file it will show improper characters (checked in libreoffice 3.4.6). so: * LO 3.4.x RTF-import filter treated locale set in Writer options and used it in case encoding wasn't specified in rtf-file * LO 3.5.x RTF-import filter doesn't treat locale settings at all
Created attachment 60552 [details] Fix Lithuanian default text encoding This patch fixes default text encoding problem only for Lithuanian language. See bug 48023 for details. Full fix for this bug would be to add default text encodings for all other languages.
Hi Aurimas, Thanks for your patch. I tested it and here is what I see: Before your patch, when i start LO with LC_ALL=lt_LT, the first three characters of the document is "àèæ" (which looks correct). After applying your patch, I get: "ąčę", which looks incorrect. Are you sure your patch improves the situation? :-) Thanks, Miklos
(In reply to comment #13) > Hi Aurimas, > > Thanks for your patch. I tested it and here is what I see: > > Before your patch, when i start LO with LC_ALL=lt_LT, the first three > characters of the document is "àèæ" (which looks correct). > > After applying your patch, I get: "ąčę", which looks incorrect. Are you sure > your patch improves the situation? :-) > Yes, I'm sure. See https://en.wikipedia.org/wiki/Lithuanian_alphabet
Oh, okay, thanks for confirming. (Needless to say I know ~nothing about the Lithuanian language.) I'll push your patch to master in a bit, then.
Aurimas Fišeras committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=a8c05ae840f2673803d9784600be9a7b734076fc fdo#44211 (RTF) return default text encoding for Lithuanian
Patch is in master, marking as resolved.
Why is this patch is just only for just one language ? Any general patch available ? We're having same issues with Turkish language either and there might be some of other problematic languages : (In reply to comment #8) > Problem exists for Turkish RTF characters either. Openoffice & MSOffice is > fine.
Hi omeringen, > Why is this patch is just only for just one language ? Any general patch > available ? To my best knowledge there is no general solution, since we try to guess an encoding based on locale info, which is never perfect. (LO 3.4 and earlier didn't had such a general solution, either.) > We're having same issues with Turkish language either and there > might be some of other problematic languages : Did you actually test daily builds? Turkish was already fixed with bug 48023, but the original fix didn't contain anything for Lithuanian, what was the fix for this bug. Please, - reopen this bug if you have a Lithuanian sample that is imported incorrectly (and was imported correctly in LO 3.4) - reopen bug 48023 if you have a ru/uk/tr sample that is imported incorrectly (and was imported correctly in LO 3.4) Thanks, Miklos