Bug 169574 - Data Provider dialog does not work with format XML
Summary: Data Provider dialog does not work with format XML
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
26.2.0.0 alpha0+ master
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Data-Provider
  Show dependency treegraph
 
Reported: 2025-11-20 14:49 UTC by Regina Henschel
Modified: 2026-01-02 14:31 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
FYI only, works in principle but might crash (2.34 KB, patch)
2026-01-02 14:31 UTC, Regina Henschel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Regina Henschel 2025-11-20 14:49:08 UTC
This is the XML part of 169077. I have split the problems, because the underlying code is different. HTML import is handled by an own part but XML import is forwarded to Orcus library.

Description:
There exist unit tests for the feature "Data Provider". But they do not work, if you do it manually by the dialog.

The test is
https://opengrok.libreoffice.org/xref/core/sc/qa/unit/dataproviders_test.cxx

The XML test file is
https://opengrok.libreoffice.org/xref/core/sc/qa/unit/data/dataprovider/xml/test1.xml
Download it, the opengrok page has a blue "Download" item.

To reproduce:
1. Open a new spreadsheet.
2. Define a database range "testDB" for the range A1:K11. That is menu Data > Define Range
3. Start Data Provider dialog. That is menu Data > Data Provider.
4. Select "TestDB" from down-load list `Database Range`
5. Select "XML" form down-load list `Data Format`
6. Click on `Browse` button and find the downloaded file "test1.xml".
7. Click on `Apply` button. Error: No import in Preview.
8. Click on `OK` button. Error: No data imported.

The test uses
158      ScOrcusImportXMLParam aParam;
159  
160      ScOrcusImportXMLParam::RangeLink aRangeLink;
161      aRangeLink.maPos = ScAddress(0, 0, 0);
162      aRangeLink.maFieldPaths.push_back("/bookstore/book/title"_ostr);
163      aRangeLink.maFieldPaths.push_back("/bookstore/book/author"_ostr);
164      aRangeLink.maRowGroups.push_back("/bookstore/book"_ostr);
165      aParam.maRangeLinks.push_back(aRangeLink);
166   sets "maFieldPaths" and generates a ScOrcusImportXMLParam.

But in https://opengrok.libreoffice.org/xref/core/sc/source/ui/dataprovider/xmldataprovider.cxx it is only
65          ScOrcusImportXMLParam::RangeLink aRangeLink;
66          aRangeLink.maPos = ScAddress(0, 0, 0);
67          aRangeLink.maFieldPaths.push_back(OUStringToOString(maID, RTL_TEXTENCODING_UTF8));
68          maParam.maRangeLinks.clear();
69          maParam.maRangeLinks.push_back(std::move(aRangeLink));

That means, that the string from the dialog (in maID) is not converted to individual fields.

BTW, the import via menu Data > XML Source works. Use the recurring element //book and link it to cell A1, for example.
Comment 1 raal 2025-11-20 17:48:58 UTC
Confirm Version: 26.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 177b3d2a88afb2dfd3e89025624d8bf62b36cda4
CPU threads: 4; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded
Comment 2 Regina Henschel 2026-01-02 14:31:32 UTC
Created attachment 204901 [details]
FYI only, works in principle but might crash

I have tried to fix the XML case. I have changed the treatment of the "Identifier" field of the dialog so, that it expects a semicolon separated list of entries: "/bookstore/book;title;author", for example. This user input is then separated into its tokens and assigned to the values of ScOrcusImportXMLParam::RangeLink. That works as long as the user enters the string syntactically correct and appropriate for the data to be imported.

However, if the user enters a not suitable string, LibreOffice might crash due to asserts in the liborcus library or in MSVC. The attached patch detects some of the possible errors in the user input. But I cannot detect all problems. For example, if the user writes an item as element name, but in the source it is an attribute, it is not possible to detect it beforehand without having read the source.

A try-catch guard does not catch the assert failures. I have tried it, but it is not included in the patch.

Thus the attached patch is no solution. I have attached is nevertheless for to inform other developers, that such attempt will not work.

The "XML Source" tool in menu "Data" circumvents the chicken-and-egg problem by first loading a stream and generating the XML structure and then allowing the user to select the part to be imported from this structure.

The "Insert Sheet from File" dialog works totally different and has no problems in reading the sources. So it might be necessary to change the "read the source" part in the "Data Provider" tool to use the same way as the "Insert Sheet from File" dialog. But that would be a very large change.

On the other hand, the "Insert Sheet from File" way might solve in addition the problem, that the current way for XML - both in "XML Source" tool and in "Data Provider" tool - cannot use sources from the internet but only local files.