Created attachment 78276 [details] Example file (windows-1250 encoded). Problem description: LibreOffice is unable to open 2003 XML files which are not encoded in UTF8. I attached valid file with encoding windows-1250 (polish). Same file converted to UTF8 opens without problems. Steps to reproduce: 1. Save file in 2003 XML, 2. Open with Notepad++ (or something similar), 3. Change encoding, 4. Edit xml header in file and change encoding, 5. Try open in LibreOffice (general I/O error). Current behavior: Fails to open. Expected behavior: Should be fixed. Last known working version was OpenOffice 3.0. Operating System: Windows 7 Version: 4.0.1.2 release
Created attachment 78277 [details] Example file (utf-8 encoded) - this works.
Are the two files identical just different encoding? I don't get an I/O error but I one opens in Spreadsheet with just A1 filled in while the other one opens in writer with just straight xml. Let us know if these are just the same file saved in two different encodings
I attached both files if you need to check. UTF8 is saved with UTF8 without BOM via Notepad++, second is saved in ANSI windows-1250 encoding. Files differ additionally in xml header. UTF-8 file has header: <?xml version="1.0" encoding="UTF-8"?> windows-1250 file has header: <?xml version="1.0" encoding="windows-1250"?> Internet Explorer open both files and displays characters in valid way - so I assume that XML is OK. Regards, Michal
Next suggestion... If file contains <?mso-application progid="Excel.Sheet"?> this should be opened in LO Calc not Writer... If you only choose LO as opening program. Regards, Michal
Most important: Yes files are identical (content) - the only differ is xml header. Regards, Michal
Okay I can confirm this behavior but I think it's an enhancement request as I can't find documentation saying we actually support windows-1250 encoding. If you find documentation that says that it's supposed to be supported please feel free to change this but for now marking as: New (confirmed) Enhancement (not a bug with any feature, you just want the ability to support windows-1250 encoding) Low - as this is the only bug I can find related to this encoding it doesn't seem like many people use it, seems appropriate setting. Thanks!
Joel, I don't think it's problem with encoding at all... windows-1250 is defaultly used by all polish Windows up to XP (I think that Vista replaced with UTF-8, but I'm not sure), so it's really not rare. If I convert file to iso8859-2 (ISO standard for central europe) I can't open too. It seems that problems is rather related to UTF8 and non-UTF8 files (but only those that contains national specified characters). LO seems to have problems with opening files that have other XML header than <?xml version="1.0" encoding="UTF-8"?> or <?xml version="1.0"?> As I said in one earlier comments... That worked perfect with OpenOffice 3.0... so if someone once added this encoding then why it's deleted for now ??? Regards, Michal
Michal, "Example file (windows-1250 encoded)" is in utf-8, too, only the header is different. Anyway, I think I could reproduce your problem, when I followed the steps you described. I got "General Error. General input/output error." Removing the support for legacy encodings might have been a side effect of some code optimalizations and/or program startup optimalizations.
Andras, great that you can reproduce - hope that this will be fixed. Personally I think that disabling non-UTF8 files in XML parser is bad idea. Just a example from my company... Our ERP system is creating some reports in 2003 XML files, these files can be opened via MSO 2007 and 2010 without problems, but LO crash with opening. All files are generated in windows-1250 because database is encoded too in windows-1250. I believe that I'm not alone with that problem. If you need to optimize then maybe good solution will be using only those codepages that are installed in system ? Most systems has UTF8 + 1-3 codepages installed. Regards, Michal
Kohei, you fixed similar issues in the past. Could you please have a look? Interestingly, it works in AOO 3.4.1 but not in Go-OO 3.2.1
Created attachment 78355 [details] Example file (windows-1250 encoded).
*** Bug 64676 has been marked as a duplicate of this bug. ***
This is for the legacy Office 2003 XML format ? IIRC this was implemented with some XSLT filters, which we re-wrote (Peter did anyhow) to use libxslt and libxml2 instead of some Java monster [ though I may mis-remember ]. It is possible that that is related ... In general using non-utf-8 encodings is (IMHO) a bad idea wherever you see it - but of course, we should try to look into that / patches appreciated etc.
Works fine on Linux. I guess libxslt cannot convert the character set because we do not distribute iconv.dll.
(In reply to comment #14) > Works fine on Linux. I guess libxslt cannot convert the character set > because we do not distribute iconv.dll. libxslt is built against iconv (default for WIN/MSC) libxml2 is built with iconv=0 sax1=1 (for WIN/MSC) Is there any inconsistency there ?
*** Bug 65005 has been marked as a duplicate of this bug. ***
it appears that libxml2 has built-in support for just a few standard encodings like UTF-8/UTF-16/ISO 8859-* and everything else is handled by an optional iconv dependency. LO does not bundle libiconv on Windows; on Linux the bundled libxml2 will pick up iconv on the system, on Mac we use the system libxml2 which should have iconv support. this problem only affects XSLT based filters which are not very popular (mainly for legacy MSO 2003 XML formats and XHTML). hmm... i don't think that adding support for obscure encodings used in obscure formats only on Windows is a good use of resources. as a workaround there is the "old" XSLT import filter based on Saxon which presumably (since this is a regression) supported more encodings, now available as an extension from https://github.com/dtardon/xslt2-transformer
Another possibility is to convert the XML to UTF-8 by an external tool. (Of course, that implies that the user knows it is the encoding that causes the failure. Patches for detection of that situation and improvement of the error message welcome.)
Yes, because people have too much free time on their hands which they can use to convert their valid documents between encodings to pleasure your shitty software.
Urmas - you have been politely warned once about language, insulting, etc...
(In reply to comment #19) > Yes, because people have too much free time on their hands which they can > use to convert their valid documents between encodings to pleasure your > shitty software. <sarcasm>Thank you for you constructive response. This is exactly the reaction I expected from you.</sarcasm> In case you have not noticed it, this project is open source. If you have problems with our decision to not waste time on something that we consider a marginal problem, you are free to fix it yourself and send a patch. Or you can go over to xmlsoft.org and try to convince Daniell, in your typical diplomatic way, to add internal support for cp1250 into libxml2. Btw., this is perfectly valid behavior for an XML processor. The only encodings required by XML 1.1 are UTF-8 and UTF-16 (see section 2.2 of the standard).
(In reply to comment #17) > as a workaround there is the "old" XSLT import filter based > on Saxon which presumably (since this is a regression) supported > more encodings, now available as an extension from > > https://github.com/dtardon/xslt2-transformer Great, I asked dtardon to upload pre-compiled version to extension-center. This would be second bug I'm working around with an extension (another is with encodings in BIFF5 / Excel 95). Next step will be to deploy the extension company-wide.
It seems doomed, xslt2-transformer did not help. To relief poor users' pain, I've written small AutoHotkey script to convert files. #NoEnv If %0% Loop %0% ConvertFiles(%A_Index%) Else { FileSelectFile srcFilesList, M,, Файлы для преобразования, Файлы XML (*.xml) Loop Parse, srcFilesList, `n { If A_Index = 1 srcDir=%A_LoopField%\ Else ConvertFiles(srcDir . A_LoopField) } } Exit ConvertFiles(srcMask) { Loop %srcMask% { LoopFileDir= If A_LoopFileDir LoopFileDir=%A_LoopFileDir%\ dstName=%LoopFileDir%%A_LoopFileName%.UTF-8.xml srcName=%A_LoopFileFullPath% FileEncoding CP1251 FileRead srcXML, %srcName% FileEncoding UTF-8-RAW StringReplace srcXML, srcXML, encoding="Windows-1251", encoding="UTF-8" IfExist %dstName% { If Not ReplaceSilently { MsgBox 35, Сохранение обработанного файла, Файл уже существует`, заменить?`n`n%dstName% IfMsgBox Cancel Exit IfMsgBox No continue ; IfMsgBox Yes If Not AskedOnce { AskedOnce=1 } Else { AskedOnce=-1 MsgBox 36, Сохранение обработанных файлов, Заменять все файлы без дополнительных вопросов? IfMsgBox Yes ReplaceSilently=1 } } FileDelete %dstName% } FileAppend %srcXML%, %dstName% } }
*** Bug 69163 has been marked as a duplicate of this bug. ***
*** Bug 59788 has been marked as a duplicate of this bug. ***
*** Bug 50012 has been marked as a duplicate of this bug. ***
*** Bug 71782 has been marked as a duplicate of this bug. ***
*** Bug 71831 has been marked as a duplicate of this bug. ***
It is possible to build libxml2 with ICU support as an alternative to iconv. Let's see if I can make this work.
Good to hear that :-) Then I move this to MAB4.2 instead, as MAB4.0 is closed already.
if this were really a "most annoying" bug it would hardly have been WONTFIX'ed in the first place
I do not wanna contest your opinion but I see quite a bunch of duplicates from many different users, maybe the WONTFIX was done because of technical incompatibilities.
(In reply to comment #32) > I do not wanna contest your opinion but I see quite a bunch of duplicates > from many different users, maybe the WONTFIX was done because of technical > incompatibilities. The same solution was possible at that time, but while this was only breaking the XSLT filters--which are effectively unmaintained anyway--nobody cared enough to look if there actually might be a solution. Now this started breaking other filters too and that is unacceptable :-)
*** Bug 81461 has been marked as a duplicate of this bug. ***
David Tardon committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=7515b1a90fac9e31733c0fdcc1156adadf0e6f99 fdo#63756 build libxml2 with ICU support The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
David Tardon committed a patch related to this issue. It has been pushed to "libreoffice-4-3": http://cgit.freedesktop.org/libreoffice/core/commit/?id=23b4b764ade82cf3a5835a7b7f35fb5e45cd6cc9&h=libreoffice-4-3 fdo#63756 build libxml2 with ICU support It will be available in LibreOffice 4.3.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
the fix breaks the installation of Java based extensions on Windows. reason is that URE/bin/uno.exe cannot load URE/bin/javavmlo.dll, which is linked against URE/bin/libxml2.dll, which is linked against program/icuucd53.dll and of course URE binaries don't have program dir on path.
Michael Stahl committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=057613c6864204ac5c09260e93a8f14cc9768b90 icu: un-break installation of Java extensions on Windows (rel. fdo#63756) The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-4-3": http://cgit.freedesktop.org/libreoffice/core/commit/?id=3012156bab9dc0504a61fa7062f8e7cbd677bad4&h=libreoffice-4-3 icu: un-break installation of Java extensions on Windows (rel. fdo#63756) It will be available in LibreOffice 4.3.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.