Bug 165886 - FILEOPEN XLSX Formula containing curvy quotation marks altered on import
Summary: FILEOPEN XLSX Formula containing curvy quotation marks altered on import
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: low trivial
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Cell-Formula XLSX
  Show dependency treegraph
 
Reported: 2025-03-24 04:38 UTC by Aron Budea
Modified: 2025-03-24 05:10 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
Sample XLSX (8.65 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2025-03-24 04:38 UTC, Aron Budea
Details
Screenshot in Excel (28.80 KB, image/png)
2025-03-24 04:41 UTC, Aron Budea
Details
forum-mso-en4-149289.xlsx (21.95 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2025-03-24 05:10 UTC, Aron Budea
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2025-03-24 04:38:18 UTC
Created attachment 199984 [details]
Sample XLSX

Excel accepts a number of quotation marks as valid characters in names. For example these curvy quotation marks: ‘ and ’ and “ and ” (but not „).
These aren't treated as regular quotation marks, as far as I could see.
Further reference on quotation marks: https://en.wikipedia.org/wiki/Quotation_mark

In Calc these are invalid characters, and it shows Err:501 for formula containing these (except when inside a string).
Eg. in attached sample XLSX:
- A1: =“
- B1: =OR(D1=0;D1<>““)
- C1: =OR(E1=0;E1<>“)

The problem is that some of these formula get changed on import:
- B1 becomes: =OR(D1=0;D1<>““))
- C1 becomes: =OR(E1=0;E1<>“))

Saving these files back to XLSX makes Excel complain, and attempting recovery removes the affected formula:
"Removed Records: Formula from /xl/worksheets/sheet1.xml part"

Observed in LO 25.8.0.0.alpha0+ (81dfc7afcdc473bd655ff64038e8a449a9999c0c), 3.3.0 / Windows.
Before the commit below in 6.3, the import is bad in a different way, the closing brackets don't get duplicated, but the quotes get dropped, eg. B1 becomes: =OR(D1=0;D1<>)

https://cgit.freedesktop.org/libreoffice/core/commit/?id=7d6f30d04c51088b26815c241a7473c48822c6c3
https://git.libreoffice.org/core/commit/7d6f30d04c51088b26815c241a7473c48822c6c3
author		Eike Rathke <erack@redhat.com>	Tue Jan 29 15:25:52 2019 +0100
committer	Eike Rathke <erack@redhat.com>	Wed Jan 30 10:55:12 2019 +0100

"Resolves: tdf#93951 set remainder as bad string if not parsed as valid"

Adding these quotes at the end of the switch solves the issue (they are in category U_INITIAL_PUNCTUATION and U_FINAL_PUNCTUATION), but I'm not sure if there is impact to consider when working with native ODF files:
https://opengrok.libreoffice.org/xref/core/i18npool/source/characterclassification/cclass_unicode_parser.cxx?r=ff16c4e3f27efc0fc9ed734b19ae778482566cdb#614

While there is likely little practical value in making this particular case work, the following should be considered:
- altering the input this way is wrong in general,
- imported files that were valid should remain valid when exported,
- I have encountered a number of invalid roundtripped spreadsheets where formula get discarded; analyzing them is tedious work, and being able to eliminate trivial cases like this would be helpful when trying to identify relevant issues.
Comment 1 Aron Budea 2025-03-24 04:41:17 UTC
Created attachment 199985 [details]
Screenshot in Excel

Note that in the sample there are defined names for “ and ” but not for ‘ and ’, that's why the results are different in rows A and B compared to C and D.
Comment 2 Aron Budea 2025-03-24 05:10:09 UTC
Created attachment 199986 [details]
forum-mso-en4-149289.xlsx

This is the file the sample is based on, exhibiting the issue in a conditional formatting formula.