Bug Hunting Session
Bug 114996 - Link to External Data shows extraneous tables when only one exists
Summary: Link to External Data shows extraneous tables when only one exists
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Calc-External-Datalink
  Show dependency treegraph
 
Reported: 2018-01-14 07:16 UTC by Dan Dascalescu
Modified: 2018-02-21 18:40 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Link to External Data spurious tables (44.77 KB, image/png)
2018-01-14 07:17 UTC, Dan Dascalescu
Details
table.html for testing (129 bytes, text/html)
2018-02-11 13:24 UTC, Buovjaga
Details
Play with this attached HTML document. (795 bytes, text/html)
2018-02-21 18:40 UTC, Eike Rathke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Dascalescu 2018-01-14 07:16:57 UTC
Description:
When trying to import data from a file with only one table, three tables are shown.

Steps to Reproduce:
1. Create "table.html" with the following contents:
<table>
  <tr><th>Asset</th><th>Balance</th></tr>
  <tr><td>USD</td><td>10</td></tr>
  <tr><td>EUR</td><td>20</td></tr>
</table>

2. Sheet -> Link to External Data
3. Browse to that file
4. Press Enter twice

Actual Results:  
Three tables are shown

Expected Results:
Only one table should be shown


Reproducible: Always


User Profile Reset: No



Additional Info:
If the table also has an id, four tables will be shown:

<table id="buggy">
  <tr><th>Asset</th><th>Balance</th></tr>
  <tr><td>USD</td><td>10</td></tr>
  <tr><td>EUR</td><td>20</td></tr>
</table>



User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Comment 1 Dan Dascalescu 2018-01-14 07:17:15 UTC
Created attachment 139086 [details]
Link to External Data spurious tables
Comment 2 Buovjaga 2018-02-11 13:24:11 UTC
Created attachment 139782 [details]
table.html for testing
Comment 3 Buovjaga 2018-02-11 13:28:12 UTC
Repro.

Arch Linux 64-bit
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4

Arch Linux 64-bit
Version: 6.1.0.0.alpha0+
Build ID: c6a23023150c164a19236139fa413d43006ce21c
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on February 11th 2018
Comment 4 Buovjaga 2018-02-11 13:31:24 UTC
The current workflow is explained in https://help.libreoffice.org/Calc/Inserting_External_Data_in_Table_WebQuery
Maybe design team could chime in.
Comment 5 Eike Rathke 2018-02-15 14:29:35 UTC
The additional "HTML_all" and "HTML_tables" ranges are needed for programmatic reasons. HTML_all encompasses the entire text content, including content that is not in tables. HTML_tables encompasses all tables of the document. These may be identical to the one HTML_1 range, but don't have to be. The HTML_# ranges are numbered in the order they are encountered, including nested tables. Tables with id="foo" are added as HTML__foo so they can be linked to by ID instead of occurrence. The table range name chosen is remembered as link so when refreshing the expected data is pulled in. The additional names are not a bug and won't be changed.
Comment 6 Heiko Tietze 2018-02-15 14:48:03 UTC
(In reply to Eike Rathke from comment #5)
> The additional names are not a bug and won't be changed.

Content and tables make not much sense and could be hidden. If that's not possible we could make the relation a bit more clear with a tree like

HTML content
- HTML tables
-- Table 1 (no ID)
-- Table 2 (Foobar)
Comment 7 Eike Rathke 2018-02-16 16:04:19 UTC
(In reply to Heiko Tietze from comment #6)
> Content and tables make not much sense and could be hidden.
They do make sense if there is text that is not in a table or if there are several tables you want to import at once.
Comment 8 Dan Dascalescu 2018-02-16 19:39:02 UTC
> The additional "HTML_all" and "HTML_tables" ranges are needed for programmatic reasons.

Does that mean they are not intended for the user?
Comment 9 Eike Rathke 2018-02-19 16:55:00 UTC
(In reply to Dan Dascalescu from comment #8)
> Does that mean they are not intended for the user?
No, read my previous comment.
Comment 10 Dan Dascalescu 2018-02-20 02:32:18 UTC
What use cases would be for a user select "HTML_all" when trying to import data in a spreadsheet, if "HTML_all encompasses the entire text content, including content that is not in tables"?

Sounds to me that the best you'd hope to get from HTML_all would be an unstructured text dump.
Comment 11 Eike Rathke 2018-02-21 18:40:38 UTC
Created attachment 140042 [details]
Play with this attached HTML document.