Home » Php » Are string functions ASCII-safe in PHP?

Are string functions ASCII-safe in PHP?

Posted by: admin July 12, 2020 Leave a comment

Questions:

Some PHP string functions (like strtoupper, etc) are locale dependent. But it is still not clear whether locale is important when I do really know that particular string is made of ASCII (0-127) characters only. Can I be guaranteed that strtoupper('abc..xyz') will always return ABC..XYZ independently of locale. Do PHP string functions work the same in ASCII range independently of locale?

While the answer about strtoupper is important to me, the question is more general about all string functions library.

I want to be sure that user selected locale (on a multi-language site) will not break my core functionality which has nothing to do with internationalization.

How to&Answers:

Do PHP string functions work the same in ASCII range independent from locale?

No, I’m afraid not. The primary counterexample is the dreaded Turkish dotted-I:

setlocale(LC_CTYPE, "tr_TR");
echo strtoupper('hi!');

-> 'H\xDD!' ('Hİ!' in ISO-8859-9)

In the worst case you may have to provide your own locale-independent string handling. Calling setlocale to revert to C or some other locale is kind-of a fix, but the POSIX process-level locale model is a really bad fit for modern client/server apps.

Answer:

PHP string functions treat one byte as one character. In the ASCII range 0-127 that is fine.

To safely handle multiple languages using UTF-8, use mb_*() functions, a UTF-8 library or wait til 2030 when PHP6 is released.