Home » Php » regex – How to replace all non-alphabetic characters with UTF-8 support in PHP

regex – How to replace all non-alphabetic characters with UTF-8 support in PHP

Posted by: admin July 12, 2020 Leave a comment

Questions:

I want to remove all non-alphabetic character from a string. The problem is that I don’t know the letter range because it is UTF8 string.

It can be ENGLISH, ՀԱՅԵՐԵՆ, ქართული, УКРАЇНСЬКИЙ, РУССКИЙ

I usually do something like this:

$str = preg_replace('/[^a-zA-Z]/', '', $str);

or

$str = preg_replace('/[^\w]/u', '', $str);

but they both clear foreign characters.

Any ideas?

How to&Answers:

UPDATE: As for Unicode, RegExp will look like this [^\p{L}\s]+ (without replacing spaces)

It will replace all non-alpha characters with UTF8 support.

  • \P{L}+ – matches any non-letter symbols
  • \p{P}+ – removes punctuation only

Here are some reference docs that can be helpful:

Answer:

Use the Unicode character properties:

$str = preg_replace('/\P{L}+/u', '', $str);

Answer:

Unicode property for letter is \pL, for non letter is \PL

$str = preg_replace('/\PL+/u', '', $str);