Bug 153444 - FILEOPEN: Excel 2003 XML with encoding iso-8859-15 does not read correctly
Summary: FILEOPEN: Excel 2003 XML with encoding iso-8859-15 does not read correctly
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.2.0.3 release
Hardware: All All
: medium normal
Assignee: Kohei Yoshida
URL:
Whiteboard: target:7.6.0
Keywords: bibisected, bisected, regression
Depends on:
Blocks: MSO-XML2003
  Show dependency treegraph
 
Reported: 2023-02-07 16:14 UTC by Holger
Modified: 2023-02-15 14:11 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
ZIP archive with example file and screenshots (104.68 KB, application/x-zip-compressed)
2023-02-07 16:14 UTC, Holger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Holger 2023-02-07 16:14:46 UTC
Created attachment 185181 [details]
ZIP archive with example file and screenshots

By default, there is no encoding specified in Microsoft Excel 2003 XML files. Then UTF-8 is expected.

However, when adding the encoding in the xml tag, Microsoft Excel reads the special characters just fine, while LibreOffice does not.

<?xml version="1.0" encoding="ISO-8859-15"?>

Steps to reproduce:
1. load the attached .xml file into LibreOffice Calc
2. special characters like ÄÖÜ µ ß äöü are not displayed correctly.

Expected behavior would be to read the encoding of the xml header line and use this to read the rest of the XML file.
Comment 1 raal 2023-02-12 14:45:15 UTC
Condirm with Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: b052ec2f2fbe0f3044ba824c064a280a5ee9cd7f
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded

Works in Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 2 raal 2023-02-12 14:57:06 UTC
This seems to have begun at the below commit.
Adding Cc: to Kohei Yoshida ; Could you possibly take a look at this one?
Thanks
 de19941f7613db5dc62e0f0903ad9f523f3d2a16 is the first bad commit
commit de19941f7613db5dc62e0f0903ad9f523f3d2a16
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Mon Dec 18 07:57:01 2017 +0100

    source 152c79ee2be2374334202dc738a8f011e47845c7

https://git.libreoffice.org/core/+/152c79ee2be2374334202dc738a8f011e47845c7
Comment 3 Kohei Yoshida 2023-02-15 00:56:00 UTC
This should fix it: https://gerrit.libreoffice.org/c/core/+/147033
Comment 4 Commit Notification 2023-02-15 03:14:08 UTC
Kohei Yoshida committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e5b55f8e05f0f3172340ee377c7dabfb714dd66c

tdf#153444: map iso-8859-* encoding range

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Kohei Yoshida 2023-02-15 03:14:33 UTC
Fixed now.
Comment 6 Commit Notification 2023-02-15 14:11:52 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3914491f0717a1842bf9a29a399bb5ef0c2f2db4

tdf#153444: sc_subsequent_filters_test: Add unittest

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.