Description: When trying to import data from a file with only one table, three tables are shown. Steps to Reproduce: 1. Create "table.html" with the following contents: <table> <tr><th>Asset</th><th>Balance</th></tr> <tr><td>USD</td><td>10</td></tr> <tr><td>EUR</td><td>20</td></tr> </table> 2. Sheet -> Link to External Data 3. Browse to that file 4. Press Enter twice Actual Results: Three tables are shown Expected Results: Only one table should be shown Reproducible: Always User Profile Reset: No Additional Info: If the table also has an id, four tables will be shown: <table id="buggy"> <tr><th>Asset</th><th>Balance</th></tr> <tr><td>USD</td><td>10</td></tr> <tr><td>EUR</td><td>20</td></tr> </table> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Created attachment 139086 [details] Link to External Data spurious tables
Created attachment 139782 [details] table.html for testing
Repro. Arch Linux 64-bit LibreOffice 3.3.0 OOO330m19 (Build:6) tag libreoffice-3.3.0.4 Arch Linux 64-bit Version: 6.1.0.0.alpha0+ Build ID: c6a23023150c164a19236139fa413d43006ce21c CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; Locale: fi-FI (fi_FI.UTF-8); Calc: group Built on February 11th 2018
The current workflow is explained in https://help.libreoffice.org/Calc/Inserting_External_Data_in_Table_WebQuery Maybe design team could chime in.
The additional "HTML_all" and "HTML_tables" ranges are needed for programmatic reasons. HTML_all encompasses the entire text content, including content that is not in tables. HTML_tables encompasses all tables of the document. These may be identical to the one HTML_1 range, but don't have to be. The HTML_# ranges are numbered in the order they are encountered, including nested tables. Tables with id="foo" are added as HTML__foo so they can be linked to by ID instead of occurrence. The table range name chosen is remembered as link so when refreshing the expected data is pulled in. The additional names are not a bug and won't be changed.
(In reply to Eike Rathke from comment #5) > The additional names are not a bug and won't be changed. Content and tables make not much sense and could be hidden. If that's not possible we could make the relation a bit more clear with a tree like HTML content - HTML tables -- Table 1 (no ID) -- Table 2 (Foobar)
(In reply to Heiko Tietze from comment #6) > Content and tables make not much sense and could be hidden. They do make sense if there is text that is not in a table or if there are several tables you want to import at once.
> The additional "HTML_all" and "HTML_tables" ranges are needed for programmatic reasons. Does that mean they are not intended for the user?
(In reply to Dan Dascalescu from comment #8) > Does that mean they are not intended for the user? No, read my previous comment.
What use cases would be for a user select "HTML_all" when trying to import data in a spreadsheet, if "HTML_all encompasses the entire text content, including content that is not in tables"? Sounds to me that the best you'd hope to get from HTML_all would be an unstructured text dump.
Created attachment 140042 [details] Play with this attached HTML document.