There exist unit tests for the feature "Data Provider". But they do not work, if you do it manually by the dialog. The test is https://opengrok.libreoffice.org/xref/core/sc/qa/unit/dataproviders_test.cxx The test files are https://opengrok.libreoffice.org/xref/core/sc/qa/unit/data/dataprovider/html/test1.html https://opengrok.libreoffice.org/xref/core/sc/qa/unit/data/dataprovider/xml/test1.xml Download them, the opengrok page has a blue "Download" item. For the case testHTMLImport: 1. Open a new spreadsheet. 2. Define a database range "testDB" for the range A1:K11. That is menu Data > Define Range 3. Start Data Provider dialog. That is menu Data > Data Provider. 4. Select "TestDB" from down-load list `Database Range` 5. Select "HTML" from down-load list `Data Format` 6. Click on `Browse` button and find the downloaded file "test1.html". 7. Click on `Apply` button. Error: No import in Preview. 8. Click on `OK` button. Error: No data imported. BTW, the import via menu Sheet > External Links works. Use locale English(USA) and detect special numbers. For the case testXMLImport: 1.-4. see above 5. Select "XML" form down-load list `Data Format` 6. Click on `Browse` button and find the downloaded file "test1.xml". 7. 8. see above. The test sets "maFieldPaths". There is nothing corresponding in the dialog. BTW, the import via menu Data > XML Source works. Use the recurring element //book and link it to cell A1, for example. As the tests themselves do not fail, I guess that there is something wrong with the dialog.
I can confirm with Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 2595f031fa93c1eb89fb4dce6f337de9be813e15 CPU threads: 4; OS: Linux 6.8; UI render: default; VCL: gtk3 Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US Calc: threaded
Let us take this bug report only for HTML. I have split the problems, because the underlying code is different. HTML import is handled by an own part but XML import is forwarded to Orcus library. For the XML import, I have written bug 169574.
Suggested fix is in https://gerrit.libreoffice.org/c/core/+/194789.
Regina Henschel committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/9187f38b48956acc892fbf3e7fe0d1942fcfb6f2 tdf#169077 dataproviderdlg setID expects mxEditID It will be available in 26.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
The bug is now fixed. But a unit test is still missing. It has to be a Python UI test, but that's not possible for me because I work with Windows. So someone else will have to step in here.
Regina Henschel committed a patch related to this issue. It has been pushed to "libreoffice-25-8": https://git.libreoffice.org/core/commit/56e1b57bd921c2dcaa4bb5932b38ea6c93eb49bd tdf#169077 dataproviderdlg setID expects mxEditID It will be available in 25.8.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Neil Roberts committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/655ba73e506d75ca9988a428620f543b213774b7 tdf#169077: Add a UITest It will be available in 26.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Commit Notification from comment #4) (... and comment #7) > Regina Henschel / Neil Roberts committed a patch related to this issue. > It has been pushed to "master": ... > It will be available in 26.2.0. ... > Affected users are encouraged to test the fix and report feedback. tried to import HTML with LibreOfficeDev_26.2.0.0.alpha1_Linux_x86-64_deb (2025-12-05_04.47.37) but could not succeed Calc spreadsheet with Data > Define Range (see #169514 attachment DataRangeForDataProvider.ods) https://bugs.documentfoundation.org/attachment.cgi?id=204066 Data > Data Provider Database Range: DBrange Data Format: HTML URL: test local copy of file https://opengrok.libreoffice.org/xref/core/sc/qa/unit/data/dataprovider/html/test1.html Identifier: tried "content" and "src" click Preview: Apply --> no entries click OK: no changes to the spreadsheet (File > Reload is done without query) Version: 26.2.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: 0686b1972806fe8b711de5ba64039fb38cd14889 CPU threads: 5; OS: Linux 6.14; UI render: default; VCL: gtk3 Locale: de-DE (de_DE.UTF-8); UI: en-US Calc: threaded
The HTML import needs in the `Identifier` field the XPath to the desired <table> element. That could be e.g. //table or in case you want the second table of the source, it would be //table[2] For example try with a target database range of 10 columns and 120 rows. URL https://de.wikipedia.org/wiki/Liste_der_erfolgreichsten_Filme_nach_Einspielergebnis Identifier //table[5] (The proposed database range is larger than actual needed.)
(In reply to Regina Henschel from comment #9) yes, this works with //table[1] through [5] for your Wikpedia example. However with //table only the first table is included. Shouldn't all tables be included then? But I can't find a way for the HTML test file. There can <table>, <tr> and <td> only be found in the text, not as HTML elements. How should the test file be used? Isn't the implementation of XPath for the Identifier contrary to the documentation? "Identifier: The target ID for HTML provided data..." I expected that the HTML attribute "id" should be used to address the items (see also the id="content" and id="src" attributes in the test file). Otherwise we should change the documentation and mention XPath there. In the Wikipedia example the entries where a <link ...> is contained in addition, are skipped (e.g. 1st table line 4 Titanic col. 2 Deutscher Titel) and many other entries are missing where additional HTML elements are contained in the <td> element.
(In reply to Michael Otto from comment #10) > But I can't find a way for the HTML test file. There can <table>, > <tr> and <td> only be found in the text, not as HTML elements. > How should the test file be used? The entry in the Identifier field has to be //table <table> is an HTML element. The implementation of this import is in https://opengrok.libreoffice.org/xref/core/sc/source/ui/dataprovider/htmldataprovider.cxx I had not touched that. I have only repaired, that the wrong field was used. > > Isn't the implementation of XPath for the Identifier contrary to the > documentation? > "Identifier: The target ID for HTML provided data..." > I expected that the HTML attribute "id" should be used to address the items > (see also the id="content" and id="src" attributes in the test file). > Otherwise we should change the documentation and mention XPath there. Yes, the documentation needs to be improved. I have already added a comment to the "WorkInProgress" version of the Calc Guide for version 26.2. Might be a bugreport for the help is needed as well. > > > In the Wikipedia example the entries where a <link ...> is contained > in addition, are skipped (e.g. 1st table line 4 Titanic col. 2 Deutscher > Titel) and many other entries are missing where additional HTML elements > are contained in the <td> element. Yes, the current HTML import is very simple. I hesitated about whether anything should be fixed at all. There was also bug 139409, where it was discussed whether the entire feature should be removed. But the feature exists since LibreOffice version 6, that is more than 7 years now. It should therefore work at least to some extent.
(In reply to Regina Henschel from comment #11) > The entry in the Identifier field has to be > //table ... > Yes, the documentation needs to be improved. I have already added a comment > to the "WorkInProgress" version of the Calc Guide for version 26.2. Might be > a bugreport for the help is needed as well. I raised LOCALHELP bug#169996 (Proposal for Identifier with HTML: "//table, //table[2], ... following Xpath") > Yes, the current HTML import is very simple. ... > It should therefore work at least to some extent. Data Provider with format HTML now works in a basic way, so according to this, for me it's ok.