We have had build failures when compiling in the Turkish locale - ie. export LANG=tr_TR.UTF-8 This is because in this locale - toupper(i) is != 'I' and toupper(ı) == 'I' cf. wikipedia: https://en.wikipedia.org/wiki/Turkish_alphabet#Letters bug#99587 - is a first cut at fixing one instance of this but - I think we can conclude on this basis that any use of 'toupper' or 'tolower' in the code is a bug. Instead we should use our own ASCII comparison functions here - ideally using the Ascii variants in sal/ OString / OUString which cope with this. This sort of thing looks plausibly safe in comparison - albeit ugly: char easytolower( char in ) { if( in<='Z' && in>='A' ) return in-('Z'-'z'); return in; } $ git grep toupper $ git grep tolower Thanks ! =)
Oh ye of little faith ;-) the demo app: /* intended for ~/a.c */ #include <stdio.h> #include <ctype.h> #include <string.h> #include <locale.h> #include <stdlib.h> int main(int argc, char **argv) { setlocale(0,getenv("LANG")); fprintf(stderr, "%d -> %d\n", 'I', tolower('I')); return 0; } $ gcc ~/a.c && LANG=tr_TR.UTF-8 ./a.out 73 -> 73 $ gcc ~/a.c && LANG=C ./a.out 73 -> 105 =)
Thought you might be interested in this other tdf since it also concerns a problem with I and i in Turkish.
Perhaps the best way to fix this is a global / manual replace using: #include <rtl/character.hxx> and using: inline sal_uInt32 toAsciiLowerCase(sal_uInt32 code) inline sal_uInt32 toAsciiUpperCase(sal_uInt32 code) in place of tolower / toupper.
I am starting to apply the fix suggested by Michael and try to make it less dangerous for Turks.
isupper() and islower() are equally evil. Instead use inline bool isAsciiLowerCase(sal_uInt32 code) inline bool isAsciiUpperCase(sal_uInt32 code) from rtl/character.hxx
Apparently, replacing tolower & toupper to toAsciiLowerCase & toAsciiUpperCase respectively breaks other things and the build fails for any language. It may be related to the islower & isupper functions that Eike Rathke mentioned.
Krishna Keshav committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=26d314d2e25945941d49a4872d7ffa27cfc2fdc8 tdf#99589 tolower / toupper - dangerous to Turks ... It will be available in 5.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Krishna Keshav committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=1fcd8dfb70124acc935c24e066dfd3e2144baec9 tdf#99589 tolower / toupper - dangerous to Turks ... It will be available in 5.2.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
apurvapriyadarshi committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=2b99f832dc611355eedace357cd50d976723d1dd tdf#99589 tolower / toupper - dangerous to Turks ... It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
krishna keshav committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=7a11763823eef04e9602c7de3fff44415061d7cb tdf#99589 tolower / toupper - dangerous to Turks ... It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Seems solved
Is it really solved ? I do: git grep 'toupper' and still see rather a number of suspicious hits =) Same for 'tolower' Of course - each one needs auditing to see if it is a real use =)
There are a few missing replacements. Currently working on them (for all tolower, toupper, islower and isupper functions).
I have completed the basic replacement of the naughty functions in .cxx files: tolower -> rtl::toAsciiLowerCase toupper -> rtl::toAsciiUpperCase islower -> rtl::isAsciiLowerCase isupper -> rtl::isAsciiUpperCase There are also several different similar functions that does similar things on characters, namely: toLower, ToLower, toUpper, ToUpper, etc. Should we check for their behavior if they cause similar problems? In addition, the first set of functions also appear in .c, .py and .java files. Should we also modify them?
I am now starting a full build with locale tr_TR.UTF-8 and will post the results if it is successful.
Gökhan Gurbetoğlu committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=9b9e5cfd2fa629b2e1dc4a193e48a4a4e8d34126 tdf#99589 - tolower / toupper - dangerous to Turks ... It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Gökhan Gurbetoğlu committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=ea5a5b1dbb669415586520c2b0c526b133aa07e4 tdf#99589 - tolower / toupper - dangerous to Turks ... It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Please note that replacing calls to isupper, toupper, etc. with calls to rtl::isAsciiUpperCase, rtl::toAsciiUpperCase, etc. is non-trivial. The rtl functions all take a UTF-32 code unit parameter of type sal_uInt32, and assert that it is in the valid UTF-32 range 0--0x10FFFF. When the input to the original isupper, toupper, etc. was of type 'char' or 'unsigned char', and the value may have been outside the ASCII range of 0--0x7F (i.e., may have been negative), blindly converting to rtl::isAsciiUpperCase, rtl::toAsciiUpperCase, etc. may cause those asserts to fire now (as those small negative char values are implicitly converted to large sal_uInt32 values).
(In reply to Stephan Bergmann from comment #18) > Please note that replacing calls to isupper, toupper, etc. with calls to > rtl::isAsciiUpperCase, rtl::toAsciiUpperCase, etc. is non-trivial. > > The rtl functions all take a UTF-32 code unit parameter of type sal_uInt32, > and assert that it is in the valid UTF-32 range 0--0x10FFFF. When the input > to the original isupper, toupper, etc. was of type 'char' or 'unsigned > char', and the value may have been outside the ASCII range of 0--0x7F (i.e., > may have been negative), blindly converting to rtl::isAsciiUpperCase, > rtl::toAsciiUpperCase, etc. may cause those asserts to fire now (as those > small negative char values are implicitly converted to large sal_uInt32 > values). I did not consider this when doing the patch. I tried to locate the functions and replace them accordingly as Michael Meeks suggested at comment #3. What do you suggest to fix these?
One idea that comes to mind is to extend include/rtl/character.hxx: For every declaration of a function taking such a 'sal_uInt32' parameter, add deleted definitions (using SAL_DELETED_FUNCTION, as character.hxx is URE API) of overloads taking 'char' and 'unsigned char' parameters. That will prevent dangerous uses of those functions from even compiling.
A polite ping, still working on this bug ?
(In reply to jan iversen from comment #21) > A polite ping, still working on this bug ? Pong! I will not be contributing to this issue at the moment.
As said in comment
abdulwd committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=47f338c1cfca96db8f0384c3498755c4cd6ff2c4 tdf#99589 tolower changed to rtl::toAsciiLowerCase It will be available in 5.4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Julien Nabet committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=f8e77c0894f5eb4495edb79a387f0deb63830226 Revert "tdf#99589 tolower changed to rtl::toAsciiLowerCase" It will be available in 5.4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Does the revert mean there are still work to be done ?
(In reply to jan iversen from comment #26) > Does the revert mean there are still work to be done ? To see whether this issue is done, check whether there are any remaining uses of islower, isupper, tolower, toupper in C++ code (and I spot exactly one such use on current master).
in our own C++ code, there are 2 problematic areas left: * rsc has a couple calls to tolower/toupper, one of which is for a command line flag -I these can be replaced with toAsciiLowerCase like comment #3 says * setup_native/source/win32/wintools/ there are also some tolower calls in here, one of which is a command line flag -I because these are MSI DLLs, it is not possible to use sal library and toAsciiLowerCase here. instead, find some header in there and add the "easytolower" function to it, and replace tolower calls with that. also, this MSI code can only be built on Windows platform.
Stephan points out that the "rsc/source/rscpp" stuff is C not C++ so you can't use toAsciiLowerCase there, put another "easytolower" there too i guess.
dilekuzulmez committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=33cbc99fea5a3e4b87e883fd60ec97d4503109f3 tdf#99589 tolower / toupper - dangerous to Turks ... It will be available in 5.4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Stephan Bergmann from comment #20) > One idea that comes to mind is to extend include/rtl/character.hxx: For > every declaration of a function taking such a 'sal_uInt32' parameter, add > deleted definitions (using SAL_DELETED_FUNCTION, as character.hxx is URE > API) of overloads taking 'char' and 'unsigned char' parameters. That will > prevent dangerous uses of those functions from even compiling. meanwhile done as <https://cgit.freedesktop.org/libreoffice/core/commit/?id=7778d9f51bd1f4d086cafe95995406c3157afb89> "Prevent calls to rtl/character.hxx functions with (signed) char arguments"
(In reply to Stephan Bergmann from comment #27) > To see whether this issue is done, check whether there are any remaining > uses of islower, isupper, tolower, toupper in C++ code (and I spot exactly > one such use on current master). The only remaining calls to islower, isupper, tolower, toupper are in C code (which cannot use rtl/character.hxx) in rsc/source/rscpp/cpp{3,5}.c.