Description: If there are some accented characters in a XML tag, the XML source import can't open this file. Steps to Reproduce: 1. Open adil-12202-01-1.xml in SC with Data->XML source-> pick the file 2. have a look to the Map to Document window 3. Actual Results: Nothing in the Map to Document window Expected Results: We should see the XML tree Reproducible: Always User Profile Reset: No Additional Info: For comparaison, open the adil-12202-01-1-with-accent-in-data-only.xml fill the xml tree can be viewed in Map to Document window.
Created attachment 171152 [details] original open data file
Created attachment 171153 [details] XML file without accented characters in tag
An error message in debug version : sc/source/filter/orcus/xmlcontext.cxx:191: Malformed XML error: malformed_xml_error: name must begin with an alphabet, but got this instead '�' (offset=769) seems orcus lib related.
Confirmed in: Version: 7.1.2.2 / LibreOffice Community Build ID: 8a45595d069ef5570103caea1b71cc9d82b2aae4 CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3 Locale: en-US (en_US.UTF-8); UI: en-US Calc: threaded
> seems orcus lib related. That error message comes from liborcus sax_parser_base.cpp:338. It looks like that library requires all element and attribute names to be in the US ASCII range, which does not correspond to what the XML specification says. I have filed an issue with that project (see https://gitlab.com/orcus/orcus/-/issues/137).
Luboš Luňák committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/6b7c2fa65eb68be520ed4135cc245e33fa22e8bf allow utf-8 in xml names (liborcus) (tdf#141672) It will be available in 7.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Luboš Luňák committed a patch related to this issue. It has been pushed to "libreoffice-7-1": https://git.libreoffice.org/core/commit/be4e23da3fe1bcdc1e1ef6982c5f0b47b5efd257 allow utf-8 in xml names (liborcus) (tdf#141672) It will be available in 7.1.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
This now works on master due to Kohei's upgrade of Orcus to 0.17.0 version. *** This bug has been marked as a duplicate of bug 145117 ***