Bug 127484 - use HTML caption tag to name externally linked tables from URL
Summary: use HTML caption tag to name externally linked tables from URL
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.3.1.2 release
Hardware: All All
: medium enhancement
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.2.0
Keywords:
Depends on:
Blocks: Calc-External-Datalink
  Show dependency treegraph
 
Reported: 2019-09-10 22:32 UTC by stragu
Modified: 2021-06-01 12:41 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description stragu 2019-09-10 22:32:00 UTC
Description:
Currently, using a website URL in "Sheet > Link to External Data..." only shows a list of generic identifiers of the type "HTML_1", which makes it difficult to identify the right table when there are many available.

The HTML <caption> tag could help telling the tables apart.

Steps to Reproduce:
1. Open Calc
2. Go to "Sheet > Link to External Data..."
3. Paste a URL in "URL of External Data Source", for example https://en.wikipedia.org/wiki/QS_World_University_Rankings
4. Press Enter on the keyboard
5. Use defaults in the "Import Options" dialog and click "OK".

Actual Results:
The "Available Tables/Ranges" show a list of tables named "HTML_<positional_number>. (see attached screenshot)

Expected Results:
Naming the tables in the list with whatever is between the corresponding <caption></caption> – if found when parsing – could help identify the right one. For example, the tables could be named with the pattern:

HTML_<positional_number> - <first 40 characters of caption>...


Reproducible: Always


User Profile Reset: No



Additional Info:
Tested with:

Version: 6.3.1.2
Build ID: 1:6.3.1~rc2-0ubuntu0.18.04.1~lo1
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-GB
Calc: threaded
Comment 1 Buovjaga 2020-04-21 16:02:05 UTC
Sounds good -> NEW

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/caption
Comment 2 Commit Notification 2021-05-29 21:35:02 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9bf4e0d8f538cb0b51d2e803156301a956deaac3

tdf#127484 - Use HTML caption tag to name externally linked tables

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 stragu 2021-06-01 12:20:09 UTC
Thank you so much for the work, Andreas!

Verified as fixed in:

Version: 7.2.0.0.alpha1+ / LibreOffice Community
Build ID: e718f0e703c0fb33a0b1b5efe7b13b02c25f3335
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-30_21:49:59
Calc: threaded

I mentioned it with a screenshot in the Release Notes: https://wiki.documentfoundation.org/ReleaseNotes/7.2#General_improvements_2

As it is very much related to this, and as you've recently been looking at the code, you might be interested in this enhancement request I just created for the same dialogue: Bug 142600
Comment 4 Andreas Heinisch 2021-06-01 12:41:28 UTC
Hi!

Thank you for the verification! I will have a look at the enhancement, after someone approves it.