Created attachment 199984 [details] Sample XLSX Excel accepts a number of quotation marks as valid characters in names. For example these curvy quotation marks: ‘ and ’ and “ and ” (but not „). These aren't treated as regular quotation marks, as far as I could see. Further reference on quotation marks: https://en.wikipedia.org/wiki/Quotation_mark In Calc these are invalid characters, and it shows Err:501 for formula containing these (except when inside a string). Eg. in attached sample XLSX: - A1: =“ - B1: =OR(D1=0;D1<>““) - C1: =OR(E1=0;E1<>“) The problem is that some of these formula get changed on import: - B1 becomes: =OR(D1=0;D1<>““)) - C1 becomes: =OR(E1=0;E1<>“)) Saving these files back to XLSX makes Excel complain, and attempting recovery removes the affected formula: "Removed Records: Formula from /xl/worksheets/sheet1.xml part" Observed in LO 25.8.0.0.alpha0+ (81dfc7afcdc473bd655ff64038e8a449a9999c0c), 3.3.0 / Windows. Before the commit below in 6.3, the import is bad in a different way, the closing brackets don't get duplicated, but the quotes get dropped, eg. B1 becomes: =OR(D1=0;D1<>) https://cgit.freedesktop.org/libreoffice/core/commit/?id=7d6f30d04c51088b26815c241a7473c48822c6c3 https://git.libreoffice.org/core/commit/7d6f30d04c51088b26815c241a7473c48822c6c3 author Eike Rathke <erack@redhat.com> Tue Jan 29 15:25:52 2019 +0100 committer Eike Rathke <erack@redhat.com> Wed Jan 30 10:55:12 2019 +0100 "Resolves: tdf#93951 set remainder as bad string if not parsed as valid" Adding these quotes at the end of the switch solves the issue (they are in category U_INITIAL_PUNCTUATION and U_FINAL_PUNCTUATION), but I'm not sure if there is impact to consider when working with native ODF files: https://opengrok.libreoffice.org/xref/core/i18npool/source/characterclassification/cclass_unicode_parser.cxx?r=ff16c4e3f27efc0fc9ed734b19ae778482566cdb#614 While there is likely little practical value in making this particular case work, the following should be considered: - altering the input this way is wrong in general, - imported files that were valid should remain valid when exported, - I have encountered a number of invalid roundtripped spreadsheets where formula get discarded; analyzing them is tedious work, and being able to eliminate trivial cases like this would be helpful when trying to identify relevant issues.
Created attachment 199985 [details] Screenshot in Excel Note that in the sample there are defined names for “ and ” but not for ‘ and ’, that's why the results are different in rows A and B compared to C and D.
Created attachment 199986 [details] forum-mso-en4-149289.xlsx This is the file the sample is based on, exhibiting the issue in a conditional formatting formula.
For the record, OOXML expects names to conform to ST_Xstring (string of characters with support for escaped invalid-XML characters), and ODF expects names to be strings.
I have a patch that enables parsing these particular characters: https://gerrit.libreoffice.org/c/core/+/183987 Eike, can you please share your thoughts whether that is a reasonable approach, or if this has to be solved differently?
Horrible ;) (to allow these as name characters) but a valid approach.. But why restrict to U+2018 ‘ LEFT SINGLE QUOTATION MARK U+201C “ LEFT DOUBLE QUOTATION MARK and U+2019 ’ RIGHT SINGLE QUOTATION MARK U+201d ” RIGHT DOUBLE QUOTATION MARK and not allow other quotation marks? Does Excel restrict them? e.g. there are at least <‘> U+2018 LEFT SINGLE QUOTATION MARK <’> U+2019 RIGHT SINGLE QUOTATION MARK <‚> U+201A SINGLE LOW-9 QUOTATION MARK <‛> U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK <‹> U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK <›> U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK <«> U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK <»> U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK <“> U+201C LEFT DOUBLE QUOTATION MARK <”> U+201D RIGHT DOUBLE QUOTATION MARK <„> U+201E DOUBLE LOW-9 QUOTATION MARK <‟> U+201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
(In reply to Eike Rathke from comment #5) > But why restrict to > ... > and not allow other quotation marks? Does Excel restrict them? e.g. there > are at least I wasn't sure about the approach, and didn't know of all the different quotation marks. Plus I thought if it ends up being merged, extending the list later will be straightforward. > <«> U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK > <»> U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK > <„> U+201E DOUBLE LOW-9 QUOTATION MARK These were among those I checked in Excel (2013), and for unknown reasons they aren't allowed. On the other hand, Japanese quotation marks are. Let me check the others, I'll amend the patch based on that. Thanks for the feedback!
I checked the listed quotation marks, none of them are accepted in formula, apart from the initial four. Funny enough, Excel's message when entering an invalid name in Name Manager is: "The name that you entered is not valid. Reasons for this can include: -The name does not begin with a letter or an underscore -The name contains a space or other invalid characters -The name conflicts with an Excel built-in name or the name of another object in the workbook" ...I don't think the accepted quotation marks fall into the letter / underscore category. These are also accepted: 『, 』, 「, 」, 《, 》, 〈, 〉 I have updated the patch with them.
That's totally weird.. there are 135 Unicode BRACKET characters ... Octal Decimal Hex HTML Character Unicode 0133 91 0x5B [,[ "[" LEFT SQUARE BRACKET 0135 93 0x5D ],] "]" RIGHT SQUARE BRACKET 0173 123 0x7B {,{ "{" LEFT CURLY BRACKET 0175 125 0x7D },} "}" RIGHT CURLY BRACKET 020105 8261 0x2045 ⁅ "⁅" LEFT SQUARE BRACKET WITH QUILL 020106 8262 0x2046 ⁆ "⁆" RIGHT SQUARE BRACKET WITH QUILL 021451 9001 0x2329 〈 "〈" LEFT-POINTING ANGLE BRACKET 021452 9002 0x232A 〉 "〉" RIGHT-POINTING ANGLE BRACKET 021641 9121 0x23A1 ⎡ "⎡" LEFT SQUARE BRACKET UPPER CORNER 021642 9122 0x23A2 ⎢ "⎢" LEFT SQUARE BRACKET EXTENSION 021643 9123 0x23A3 ⎣ "⎣" LEFT SQUARE BRACKET LOWER CORNER 021644 9124 0x23A4 ⎤ "⎤" RIGHT SQUARE BRACKET UPPER CORNER 021645 9125 0x23A5 ⎥ "⎥" RIGHT SQUARE BRACKET EXTENSION 021646 9126 0x23A6 ⎦ "⎦" RIGHT SQUARE BRACKET LOWER CORNER 021647 9127 0x23A7 ⎧ "⎧" LEFT CURLY BRACKET UPPER HOOK 021650 9128 0x23A8 ⎨ "⎨" LEFT CURLY BRACKET MIDDLE PIECE 021651 9129 0x23A9 ⎩ "⎩" LEFT CURLY BRACKET LOWER HOOK 021652 9130 0x23AA ⎪ "⎪" CURLY BRACKET EXTENSION 021653 9131 0x23AB ⎫ "⎫" RIGHT CURLY BRACKET UPPER HOOK 021654 9132 0x23AC ⎬ "⎬" RIGHT CURLY BRACKET MIDDLE PIECE 021655 9133 0x23AD ⎭ "⎭" RIGHT CURLY BRACKET LOWER HOOK 021660 9136 0x23B0 ⎰,⎰ "⎰" UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION 021661 9137 0x23B1 ⎱,⎱ "⎱" UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION 021664 9140 0x23B4 ⎴,⎴ "⎴" TOP SQUARE BRACKET 021665 9141 0x23B5 ⎵,⎵ "⎵" BOTTOM SQUARE BRACKET 021666 9142 0x23B6 ⎶ "⎶" BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET 021736 9182 0x23DE ⏞ "⏞" TOP CURLY BRACKET 021737 9183 0x23DF ⏟ "⏟" BOTTOM CURLY BRACKET 021740 9184 0x23E0 ⏠ "⏠" TOP TORTOISE SHELL BRACKET 021741 9185 0x23E1 ⏡ "⏡" BOTTOM TORTOISE SHELL BRACKET 023554 10092 0x276C ❬ "❬" MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT 023555 10093 0x276D ❭ "❭" MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT 023560 10096 0x2770 ❰ "❰" HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 023561 10097 0x2771 ❱ "❱" HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT 023562 10098 0x2772 ❲ "❲" LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT 023563 10099 0x2773 ❳ "❳" LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT 023564 10100 0x2774 ❴ "❴" MEDIUM LEFT CURLY BRACKET ORNAMENT 023565 10101 0x2775 ❵ "❵" MEDIUM RIGHT CURLY BRACKET ORNAMENT 023746 10214 0x27E6 ⟦,⟦ "⟦" MATHEMATICAL LEFT WHITE SQUARE BRACKET 023747 10215 0x27E7 ⟧,⟧ "⟧" MATHEMATICAL RIGHT WHITE SQUARE BRACKET 023750 10216 0x27E8 ⟨,⟨,⟨ "⟨" MATHEMATICAL LEFT ANGLE BRACKET 023751 10217 0x27E9 ⟩,⟩,⟩ "⟩" MATHEMATICAL RIGHT ANGLE BRACKET 023752 10218 0x27EA ⟪ "⟪" MATHEMATICAL LEFT DOUBLE ANGLE BRACKET 023753 10219 0x27EB ⟫ "⟫" MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET 023754 10220 0x27EC ⟬ "⟬" MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET 023755 10221 0x27ED ⟭ "⟭" MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET 024603 10627 0x2983 ⦃ "⦃" LEFT WHITE CURLY BRACKET 024604 10628 0x2984 ⦄ "⦄" RIGHT WHITE CURLY BRACKET 024607 10631 0x2987 ⦇ "⦇" Z NOTATION LEFT IMAGE BRACKET 024610 10632 0x2988 ⦈ "⦈" Z NOTATION RIGHT IMAGE BRACKET 024611 10633 0x2989 ⦉ "⦉" Z NOTATION LEFT BINDING BRACKET 024612 10634 0x298A ⦊ "⦊" Z NOTATION RIGHT BINDING BRACKET 024613 10635 0x298B ⦋ "⦋" LEFT SQUARE BRACKET WITH UNDERBAR 024614 10636 0x298C ⦌ "⦌" RIGHT SQUARE BRACKET WITH UNDERBAR 024615 10637 0x298D ⦍ "⦍" LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 024616 10638 0x298E ⦎ "⦎" RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 024617 10639 0x298F ⦏ "⦏" LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 024620 10640 0x2990 ⦐ "⦐" RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER 024621 10641 0x2991 ⦑ "⦑" LEFT ANGLE BRACKET WITH DOT 024622 10642 0x2992 ⦒ "⦒" RIGHT ANGLE BRACKET WITH DOT 024623 10643 0x2993 ⦓ "⦓" LEFT ARC LESS-THAN BRACKET 024624 10644 0x2994 ⦔ "⦔" RIGHT ARC GREATER-THAN BRACKET 024625 10645 0x2995 ⦕ "⦕" DOUBLE LEFT ARC GREATER-THAN BRACKET 024626 10646 0x2996 ⦖ "⦖" DOUBLE RIGHT ARC LESS-THAN BRACKET 024627 10647 0x2997 ⦗ "⦗" LEFT BLACK TORTOISE SHELL BRACKET 024630 10648 0x2998 ⦘ "⦘" RIGHT BLACK TORTOISE SHELL BRACKET 024774 10748 0x29FC ⧼ "⧼" LEFT-POINTING CURVED ANGLE BRACKET 024775 10749 0x29FD ⧽ "⧽" RIGHT-POINTING CURVED ANGLE BRACKET 027002 11778 0x2E02 ⸂ "⸂" LEFT SUBSTITUTION BRACKET 027003 11779 0x2E03 ⸃ "⸃" RIGHT SUBSTITUTION BRACKET 027004 11780 0x2E04 ⸄ "⸄" LEFT DOTTED SUBSTITUTION BRACKET 027005 11781 0x2E05 ⸅ "⸅" RIGHT DOTTED SUBSTITUTION BRACKET 027011 11785 0x2E09 ⸉ "⸉" LEFT TRANSPOSITION BRACKET 027012 11786 0x2E0A ⸊ "⸊" RIGHT TRANSPOSITION BRACKET 027014 11788 0x2E0C ⸌ "⸌" LEFT RAISED OMISSION BRACKET 027015 11789 0x2E0D ⸍ "⸍" RIGHT RAISED OMISSION BRACKET 027034 11804 0x2E1C ⸜ "⸜" LEFT LOW PARAPHRASE BRACKET 027035 11805 0x2E1D ⸝ "⸝" RIGHT LOW PARAPHRASE BRACKET 027042 11810 0x2E22 ⸢ "⸢" TOP LEFT HALF BRACKET 027043 11811 0x2E23 ⸣ "⸣" TOP RIGHT HALF BRACKET 027044 11812 0x2E24 ⸤ "⸤" BOTTOM LEFT HALF BRACKET 027045 11813 0x2E25 ⸥ "⸥" BOTTOM RIGHT HALF BRACKET 027046 11814 0x2E26 ⸦ "⸦" LEFT SIDEWAYS U BRACKET 027047 11815 0x2E27 ⸧ "⸧" RIGHT SIDEWAYS U BRACKET 027125 11861 0x2E55 ⹕ "⹕" LEFT SQUARE BRACKET WITH STROKE 027126 11862 0x2E56 ⹖ "⹖" RIGHT SQUARE BRACKET WITH STROKE 027127 11863 0x2E57 ⹗ "⹗" LEFT SQUARE BRACKET WITH DOUBLE STROKE 027130 11864 0x2E58 ⹘ "⹘" RIGHT SQUARE BRACKET WITH DOUBLE STROKE 030010 12296 0x3008 〈 "〈" LEFT ANGLE BRACKET 030011 12297 0x3009 〉 "〉" RIGHT ANGLE BRACKET 030012 12298 0x300A 《 "《" LEFT DOUBLE ANGLE BRACKET 030013 12299 0x300B 》 "》" RIGHT DOUBLE ANGLE BRACKET 030014 12300 0x300C 「 "「" LEFT CORNER BRACKET 030015 12301 0x300D 」 "」" RIGHT CORNER BRACKET 030016 12302 0x300E 『 "『" LEFT WHITE CORNER BRACKET 030017 12303 0x300F 』 "』" RIGHT WHITE CORNER BRACKET 030020 12304 0x3010 【 "【" LEFT BLACK LENTICULAR BRACKET 030021 12305 0x3011 】 "】" RIGHT BLACK LENTICULAR BRACKET 030024 12308 0x3014 〔 "〔" LEFT TORTOISE SHELL BRACKET 030025 12309 0x3015 〕 "〕" RIGHT TORTOISE SHELL BRACKET 030026 12310 0x3016 〖 "〖" LEFT WHITE LENTICULAR BRACKET 030027 12311 0x3017 〗 "〗" RIGHT WHITE LENTICULAR BRACKET 030030 12312 0x3018 〘 "〘" LEFT WHITE TORTOISE SHELL BRACKET 030031 12313 0x3019 〙 "〙" RIGHT WHITE TORTOISE SHELL BRACKET 030032 12314 0x301A 〚 "〚" LEFT WHITE SQUARE BRACKET 030033 12315 0x301B 〛 "〛" RIGHT WHITE SQUARE BRACKET 0177027 65047 0xFE17 ︗ "︗" PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET 0177067 65079 0xFE37 ︷ "︷" PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET 0177070 65080 0xFE38 ︸ "︸" PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET 0177071 65081 0xFE39 ︹ "︹" PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET 0177072 65082 0xFE3A ︺ "︺" PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET 0177073 65083 0xFE3B ︻ "︻" PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET 0177074 65084 0xFE3C ︼ "︼" PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET 0177075 65085 0xFE3D ︽ "︽" PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET 0177076 65086 0xFE3E ︾ "︾" PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET 0177077 65087 0xFE3F ︿ "︿" PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET 0177100 65088 0xFE40 ﹀ "﹀" PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET 0177101 65089 0xFE41 ﹁ "﹁" PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET 0177102 65090 0xFE42 ﹂ "﹂" PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET 0177103 65091 0xFE43 ﹃ "﹃" PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET 0177104 65092 0xFE44 ﹄ "﹄" PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET 0177107 65095 0xFE47 ﹇ "﹇" PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET 0177110 65096 0xFE48 ﹈ "﹈" PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET 0177133 65115 0xFE5B ﹛ "﹛" SMALL LEFT CURLY BRACKET 0177134 65116 0xFE5C ﹜ "﹜" SMALL RIGHT CURLY BRACKET 0177135 65117 0xFE5D ﹝ "﹝" SMALL LEFT TORTOISE SHELL BRACKET 0177136 65118 0xFE5E ﹞ "﹞" SMALL RIGHT TORTOISE SHELL BRACKET 0177473 65339 0xFF3B [ "[" FULLWIDTH LEFT SQUARE BRACKET 0177475 65341 0xFF3D ] "]" FULLWIDTH RIGHT SQUARE BRACKET 0177533 65371 0xFF5B { "{" FULLWIDTH LEFT CURLY BRACKET 0177535 65373 0xFF5D } "}" FULLWIDTH RIGHT CURLY BRACKET 0177542 65378 0xFF62 「 "「" HALFWIDTH LEFT CORNER BRACKET 0177543 65379 0xFF63 」 "」" HALFWIDTH RIGHT CORNER BRACKET 0350425 119061 0x1D115 𝄕 "𝄕" MUSICAL SYMBOL BRACKET 03400133 917595 0xE005B 󠁛 "" TAG LEFT SQUARE BRACKET 03400135 917597 0xE005D 󠁝 "" TAG RIGHT SQUARE BRACKET 03400173 917627 0xE007B 󠁻 "" TAG LEFT CURLY BRACKET 03400175 917629 0xE007D 󠁽 "" TAG RIGHT CURLY BRACKET
(In reply to Eike Rathke from comment #8) > That's totally weird.. there are 135 Unicode BRACKET characters ... I was only looking at characters the wikipedia article lists as quotation marks. Ultimately I would prefer not opening new cans of worms now, and let further exotic characters be considered if/when they come up in real life examples.
That's fine with me.
Aron Budea committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/8397af1bc49897a2d8ebe30c1e960661271503e9 tdf#165886 sc: parse various quote characters It will be available in 25.8.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Aron Budea committed a patch related to this issue. It has been pushed to "libreoffice-25-2": https://git.libreoffice.org/core/commit/151e5d9eb6dd64fe96fa122f6151d3abb1b07fc4 tdf#165886 sc: parse various quote characters It will be available in 25.2.4. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Aron Budea committed a patch related to this issue. It has been pushed to "libreoffice-24-8": https://git.libreoffice.org/core/commit/93f2c3d9e196ac27e218a0230ac98f5c03f81881 tdf#165886 sc: parse various quote characters It will be available in 24.8.8. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Aron Budea committed a patch related to this issue. It has been pushed to "libreoffice-25-2-3": https://git.libreoffice.org/core/commit/4fd225c3125ea9d5dfe9e7cec1e9697daf26e172 tdf#165886 sc: parse various quote characters It will be available in 25.2.3. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Aron Budea committed a patch related to this issue. It has been pushed to "libreoffice-24-8-7": https://git.libreoffice.org/core/commit/4453fb4b88b0a11252c3624c95164029c61f29f9 tdf#165886 sc: parse various quote characters It will be available in 24.8.7. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.