Home » Php » php – Exotic names for methods, constants, variables and fields – Bug or Feature?

php – Exotic names for methods, constants, variables and fields – Bug or Feature?

Posted by: admin April 23, 2020 Leave a comment


after some confusion in the comments to

I thought I make into a question. According to the PHP manual, a valid class name should match against [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*. But apparently, this is not enforced, nor does it apply for anything else:

define('π', pi());

class ␀ {
    private $␀ = TRUE;
    public function ␀()
        return $this->␀;

$␀ = new ␀;
var_dump($␀ );

works fine (even though my IDE cannot show ␀). Can some erudite person clear this up for me? Can we use any Unicode? And if so, since when? Not that I would actually want to use anything but A-Za-z_ but I’m curious.

Clarification: I am not after a Regex to validate class names, nor do I know if PHP internally uses the Regex it suggests in the manual. The thing that confused me (and apparently the other guys in the linked question) is why things like $☂ = 1 can be used in PHP at all. PHP6 was suppposed to be the Unicode release but PHP6 is in hiatus. But if there is no Unicode support, why can I do this then?

How to&Answers:

This question starts to mention class names in the title, but then goes on to an example that includes exotic names for methods, constants, variables, and fields. There are actually different rules for these. Let’s start with the case insensitive ones.

Case-insensitive identifiers (class and function/method names)

The general guideline here would be to use only printable ASCII characters. The reason is that these identifiers are normalized to their lowercase version, however, this conversion is locale-dependent. Consider the following PHP file, encoded in ISO-8859-1:

function func_á() { echo "worked"; }

Will this script work? Maybe. It depends on what tolower(193) will return, which is locale-dependent:

$ LANG=en_US.iso88591 php a.php
$ LANG=en_US.utf8 php a.php

Fatal error: Call to undefined function func_Á() in /home/glopes/a.php on line 3

Therefore, it’s not a good idea to use non-ASCII characters. However, even ASCII characters may give trouble in some locales. See this discussion. It’s likely that this will be fixed in the future by doing a locale-independent lowercasing that only works with ASCII characters.

In conclusion, if we use multi-byte encodings for these case-insensitive identifiers, we’re looking for trouble. It’s not just that we can’t take advantage of the case insensitivity. We might actually run into unexpected collisions because all the bytes that compose a multi-byte character are individually turned into lowercase using locale rules. It’s possible that two different multi-byte characters map to the same modified byte stream representation after applying the locale lowercase rules to each of the bytes.

Case-sensitive identifiers (variables, constants, fields)

The problem is less serious here, since these identifiers are case sensitive. However, they are just interpreted as bytestreams. This means that if we use Unicode, we must consistently use the same byte representation; we can’t mix UTF-8 and UTF-16; we also can’t use BOMs.

In fact, we must stick to UTF-8. Outside of the ASCII range, UTF-8 uses lead bytes from 0xc0 to 0xfd and the trail bytes are in the range 0x80 to 0xbf, which are in the allowed range per the manual. Now let’s say we use the character “Ġ” in a UTF-16BE encoded file. This will translate to 0x01 0x20, so the second byte will be interpreted as a space.

Having multi-byte characters being read as if they were single-byte characters is, of course, no Unicode support at all. PHP does have some multi-byte support in the form of the compilation switch “–enable-zend-multibyte” (as of PHP 5.4, multibyte support is compiled in by default, but disabled; you can enable it with zend.multibyte=On in php.ini). This allows you to declare the encoding of the the script:

// code here

It will also handle BOMs, which are used to auto-detect the encoding and do not become part of the output. There are, however, a few downsides:

  • Peformance hit, both memory and cpu. It stores a representation of the script in an internal multi-byte encoding, which takes more space (and it also seems to store in memory the original version) and it also spends some CPU converting the encoding.
  • Multi-byte support is usually not compiled in, so it’s less tested (more bugs).
  • Portability issues between installations that have the support compiled in and those that don’t.
  • Refers only to the parsing stage; does not solve the problem outlined for case-insensitive identifiers.

Finally, there is the problem of lack of normalization – the same character may be represented with different Unicode code points (independently of the encoding). This may lead to some very difficult to track bugs.


Your character is encoded as 0x80 0x90 0xe2 or something like that, thus it matches your regexp when not interpreting the unicode (working on single bytes).


From the official documentation:

The class name can be any valid label, provided it is not a PHP reserved word. A valid class name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$.


From my understanding, the current versions of PHP have some unicode support, but it is inconsistent. As others have suggested, this was going to be addressed in PHP6, which was canceled (not postponed). At the end of the day, some “exotic” characters will work, and others won’t; and obviously, as you suggested, it is better to stick with A-Za-z0-9_.

At the same time, I have heard rumors that the unicode discussion was recently restarted, presumably from scratch, as the original proposal for UTF-16 in PHP6 involved tons of effort with very little return.

Side note: From what I have read, the next major PHP release will be PHP 5.4, which might feature horizontal integration (traits), array shorthand, built-in HTTP server, and some other much needed functionality.

http://www.mail-archive.com/[email protected]/msg35720.html