Created attachment 69146 [details]
Import error description
Imported numbers over a link have the wrong format:
numbers with 3 decimals: 5.123 are converted to 5123
numbers with 2 decimals: 5.12 are converted to 5.12
numbers from an engish webpage is imported with a decimal point in a german calc sheet instead of a comma. I know that you can choose the language of the import, but without success.
More or less [Reproducible] with reporter's sample and "LibreOffice 22.214.171.124” German UI/ German Locale [Build-ID: 58f22d5]" on German WIN7 Home Premium (64bit).
But when I open reporter's sample with UI language and Locale = English, everything seems ok after I have updated the links.
So currently it's not a general insolvable problem. Works like a charm with 3.7.0 Master, and also with 126.96.36.199 rc if I switch to Enlish-USA locale.
Nothing new, same behavior already in OOo3 an OOo 2
This one is related to
"Bug 47109 - 'Insert external data' form HTML table imports US dates 31/12/2012 with Text FORMATTING"
"Bug 53103 - FILEOPEN INSERT 'Link to external data' 'Select the language to use for import' does not make a difference with dates/numbers"
"Bug 53177 - CONFIGURATION "Use 'English (USA)' locale for numbers" messes with dates and "insert link to external data""
But of course it would be more comfortable to have a temporary locale setting for the import, so I think this is a
Request for Enhancement
Menu 'Insert -> External Data' has the possibility to select Language and "Special Number Recognition", but not to to modify Locale settings for the import. So "Strings looking similar to Numbers often will be imported as STRINGS, not as Nubers, because decimal separator of LibO locale does not match with decimal separator on HTML page.
That problem can be solved by modifying Locale setting before import, but that also has impact for other opened documents, and when the Document will be updated next time with LibO locale different from External Data locale the problem reappears.
Allow some document related only for data update used locale setting.
This might have ODF compatibility impact, so may be that locale can't be saved with the document, but has to be asked for every Data Update (what will be resky, how can user know locale of data he wants to import).
That's not a trivial issue, too many questions might annoy users for standard import situations.
*** Bug 53177 has been marked as a duplicate of this bug. ***
I think the main problem is that if I set the language option to english only for the current document, the number format remains still German.
So an english document should have an english number format, independent of the UI settings.
System: Mac OS X 10.7.4
System Locale is Dutch (or whatever non-English), LO will follow this.
Using Version 188.8.131.52 (Build ID: da8c1e6) and the preferences (tools -> options):
- UI US English (that is not the point, see 53177)
- Preferences -> Language Settings -> Languages -> Locale Setting
[Default - Dutch (Netherlands)]
- Preferences -> Load/Save -> HTML Compatibility
[v] Use English (USA) locale for numbers
No problem when opening the attachment #69146 [details].
Now go back to the preferences:
- Preferences -> Load/Save -> HTML Compatibility
[ ] Use English (USA) locale for numbers
Now refresh the link, in the Macintosh version:
- Edit -> Links..
Select the link and press [Update]
Now the spreadsheet has the broken numbers as reported.
So it seems impossible to import numbers with a decimal dot when using a decimal comma in your locale unless you select:
- Preferences -> Load/Save -> HTML Compatibility
[v] Use English (USA) locale for numbers (this is the point!)
What nobody seems to get is that this solution behaves wrong. It messes with dates as well. Also it is questionable if such a setting should exist. A question on import about number- and date-format is a lot more robust.
So I do not understand why bug 53177 is closed as being a duplicate from this one. It is not! The problem described in this bug does not occur when using the "HTML Compatibility" setting.
Joaquin you are right, the problem is, that "Use English (USA) locale for numbers" is a global option, but it should be a document option. In one spreadsheet I want to import an english html, in another one I import german, so my customers have to switch the options every time.
The import language can be done in two different menus:
1. "Use English (USA) locale for numbers"
2. menu Insert-LinkToExternalData
then fill in the link http://www.n2yo.com/satellites/satlinker.php
then a menu ImportOptions pops up with language and DetectSpecialNumbers option.
These language settings are completely messed up. Why should I want to convert imported numbers in another language than the document's language????
When I import from e.g. a swedish homepage, it's most likly that the numbers on this page are swedish format to - what else???
The solution is easy, make all these language settings document dependant. If I open an english document, I want ALWAYS english numbers and dates. It's an error that I have german numbers in an english document, as long as I activate LanguageSettings-Languages-LocaleSetting-english.
I also think that document related settings might cover most applications (but see my odf concerns), but of course it also might be that someone needs sheet related or even table related settings. Does anybody know applications with such needs?
There might be a need for paragraph dependend language settings in documents with two languages like dictionaries or translations. But these special cases are not covered by the current LO version too. Websites with two languages are a horror for any webmaster because google spiders don't want to index them.
I don't know the restrictions of the open document format, I'm not familiar with this. Rainer, perhaps you can put this Bug or Enhancement in the right category? It's more than an import problem, it's dealing with fundamental language treatment.
It should not be global and it should not be a document setting since the problem is the source, not the document. I can link to datasources all over the globe in the same document. Also it should not be a locale because people mess with locales.
The easiest solution, in my opinion, is a choice per external link, so you can select the decimal separator ("." or ",") and the date (e.g. USA, "rest of the world", ISO, that import seems to work in LO as long as it does not swap month and day).
There already is a choice about the used locale when inserting a link but selecting the locale is not good enough.
That is caused by:
- The USA using month and day swapped (as oposed to all other English and other locales)
- People messing with decimal separators and dates putting them in different than their locale dictates.
Also this locale selection when inserting a link does not work as expected (bug 53103).
Note: There is another reason why the locale is not a good choice. People are messy. Most countries in the world use DMY and/or ISO for dates:
Just like Mexico already seems to have adopted the decimal dot and unofficially Letter instead of A4, there is somd confusion about the official date (fecha) format here and there:
Letter used instead of the official A4:
So maybe the description of this bug should be:
"allow (source related) date andn number setting for 'Insert - External Data'"
Joaquin: with every import there must be a conversion. The point is, that target formats cannot be well-defined. If I import english html in an english document, then the problem is at the moment, that the english doc may have english text, but numbers in another language.
Your argument is, that us english has another date format, and there is only lang="en" in html. It sould be no problem to choose the exact language of the html source manually. If the documents' target language is set to english(us), the import converter knows what to do.
Your second argument, that websites might have wrong formats is not the problem of LO. Certainly you can create a converter with all possible options, but that is a lot work and to confusing for users.
But it's not an import converter problem. The converter does not know exactly the targets' document formats, because they can be changed in to many places. So if one user has other global options than the other, the converter works right or not.
Satlinker: Anyway the actual global setting "Use English (USA) locale for numbers" is
- misleading, because it messes with dates
- just not good enough because it does not allow for the "other English" as opposed to a locale with a decimal ",".
I will try to create a document with a few HTML sources to see how the html "lang=" or the omission of this setting behaves without "Use English (USA) locale for numbers".
This already shows difficult. There is hardly any page out there that uses the setting and, if so, uses "en-us" when using a US-date and decimal ".". Non of the bigger investment companies, only Yahoo! I guess the HTML "lang=" setting is something that should not be trusted.
Created attachment 69445 [details]
External HTML sources with different "lang=" settings
This attachment contains links to 4 html sources that, at the moment, have the following language setting and really used decimal separator vs date format:
- lang=nl, document has decimal separator ",", date: DD-MM-YYYY
- no HTML-lang setting (but document uses English, decimal dot, DD-month)
- lang=en but document uses US dates (in other text-strings) and decimal dot
So this is incorrect, the document should have used "lang=en-us"
- lang=en-us, document uses decimal dot and dates like "Fri, Nov 2, 2012"
(not shown when importing "HTML Tables")
I don't see any influence by the HTML setting on the import behaviour of numbers. Also I would say that trusting the HTML source is useless since the language setting is usually absent or wrong (also some sites don't use plain HTML, take a look at Google Finance).
Like the original report stated, the following combination gives me all the numbers:
- Non US locale (e.g. Dutch, German, ..)
- "Use English (USA) locale for numbers"
- numbers with a decimal "," are imported as text (use the VALUE function)
- decimal dot numbers are imported as numbers
See bug 53177 for the "date behaviour" and the related new additional global language setting.
The locale selector dialog has been implemented for Link to External Data as well, just that it currently has no effect.
*** This bug has been marked as a duplicate of bug 53103 ***