Created attachment 165991 [details] Original .XLS report Many report builder save its own file with .XLS extension but really in html format.LibreOffice now open this file, but not understand Unicode or CP-1251
Created attachment 165992 [details] Screenshot opening original file and its content
Created attachment 165993 [details] Original .XLS report translated to cp1251
Created attachment 165994 [details] Screenshot opening translated file and its content
I confirm the behavior in current 7.1. Mike, what do you think about it? Should LO knows that it isn't really a XLS file but it's a HTML report? NOTOURBUG?
(In reply to Roman Kuznetsov from comment #4) > Mike, what do you think about it? Should LO knows that it isn't really a XLS > file but it's a HTML report? NOTOURBUG? LibreOffice knows it's an HTML. Otherwise, it would not import its structure correctly. The so-called "builders" are so cute - they naturally consider the proper HTML structure (with header/body, and meta having encoding etc) a rocket science, relying on some magic of auto-detection of encoding made by the software. The "HTML" lacks everything, including even <html> and <body> tag. So LibreOffice detects HTML, sees the absent metadata (encoding info), and just assumes cp-1252. As an enhancement to encourage those "builders" (the brilliant samples of shitcode) to keep generating those awful reports, we could try to use something like was implemented recently for tdf#60145 when absence of meta-data was detected.
I understand, that it is "shitcode". But it is lives ... However open dialogue contain option to select encoding but that it not work. It is problem.
(In reply to Andrew from comment #6) > However open dialogue contain option to select encoding but that it not > work. It is problem. It doesn't contain options for encoding, only for language. The language is used to decide which locale to use to detect numbers (u.e., which decimal/thousand separators, currency, etc. to use). It has nothing to do with encoding.
Hmm... Ok. This is not so clear from the text of the dialogue. Thanks. I think this can be seen as improvement.
Anyway, introducing a generic method to detect encoding for texts (which should be used by various filters when they fail to recognize the encoding themselves, presumably based on ICU as tdf#60145 fix does) is a valid enhancement request...
(In reply to Andrew from comment #8) And also a valid enhancement request is to improve the wording of the dialog (kompi: a hint ;-) - I'd made that a separate request)
(In reply to Mike Kaganski from comment #9) > Anyway, introducing a generic method to detect encoding for texts (which > should be used by various filters when they fail to recognize the encoding > themselves, presumably based on ICU as tdf#60145 fix does) is a valid > enhancement request... Moving to NEW and changing to enhancement