See gerrit patch https://gerrit.libreoffice.org/21745 which updates our documentation for Calc's CLEAN function.
Per http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018842_715980110 this command should remove all "character belongs to [UNICODE] class Cc (Other - Control), or to Unicode class Cn (Other - Not Assigned)"
This misses many characters, see e.g.
One problem is, that the Excel or e.g. Apple's Numbers (not ODF software) aren't conform, too. We should for import / export reasons add a duplication and create a CLEAN_ADD or CLEAN_EXCEL (or whatever naming) or let all the code as it is and add a CLEAN_ODF function.
I believe you.
Please implement also an additional compatible CLEAN function for MSO Excel / OOXML (which only replaces 0x20 <= c (see lcl_ScInterpreter_IsPrintable and Page 2123 of Ecma Office Open XML Part 1 - Fundamentals And Markup Language Reference.pdf in http://www.ecma-international.org/publications/standards/Ecma-376.htm ECMA-376 4th edition Part 1)
including the mapping
Using http://www.unicode.org/Public/UNIDATA/UnicodeData.txt as reference, we can grep for ';(Cc|Cn);'.
We find that Cc and Cn chars are:
0x00 to 0x1F (inclusive)
0x7F to 0x9F (inclusive)
Using the above info, we can update the CLEAN isPrintable() function.
I also added CLEAN.OOXML using a new isPrintable_OOXML() function that returns true only if c > 0x1f.
Using the following fods to verify, I found that the CHAR function is not appropriate to use with CLEAN since CHAR returns 0xfffd for the range of 0x80 to 0xFF inclusive. 0x100 and above returns Err:502.
Changing CHAR to UNICHAR, we can then successfully test the range 0x80 to 0x9F.
I'll attempt to use gerrit to start the patch process.
A polite ping, still working on this bug?
I lost focus after not being able to figure out how to update 2 files.
(In reply to Taylor Lee from comment #5)
> I lost focus after not being able to figure out how to update 2 files.
It would be great to see you continue :) If you need some tips, you could visit the #libreoffice-dev IRC channel during EU office hours: https://wiki.documentfoundation.org/Website/IRC
Eike is in there, nickname: erAck
This bug has been in ASSIGNED status for more than 3 months without any
activity. Resetting it to NEW.
Please assigned it back to yourself if you're still working on this.