Some PHP string functions (like strtoupper, etc) are locale dependent. But it is still not clear whether locale is important when I do really know that particular string is made of ASCII (0-127) characters only. Can I be guaranteed that
strtoupper('abc..xyz') will always return
ABC..XYZ independently of locale. Do PHP string functions work the same in ASCII range independently of locale?
While the answer about
strtoupper is important to me, the question is more general about all string functions library.
I want to be sure that user selected locale (on a multi-language site) will not break my core functionality which has nothing to do with internationalization.
Do PHP string functions work the same in ASCII range independent from locale?
No, I’m afraid not. The primary counterexample is the dreaded Turkish dotted-I:
setlocale(LC_CTYPE, "tr_TR"); echo strtoupper('hi!'); -> 'H\xDD!' ('Hİ!' in ISO-8859-9)
In the worst case you may have to provide your own locale-independent string handling. Calling
setlocale to revert to
C or some other locale is kind-of a fix, but the POSIX process-level locale model is a really bad fit for modern client/server apps.
PHP string functions treat one byte as one character. In the ASCII range
0-127 that is fine.
To safely handle multiple languages using UTF-8, use
mb_*() functions, a UTF-8 library or wait til 2030 when PHP6 is released.