Home » Php » php – How to remove chinese characters in a string

php – How to remove chinese characters in a string

Posted by: admin July 12, 2020 Leave a comment


is there any easy way to truncate chinese characters i found that regexp but it doesn’t work as expected


echo str_replace(preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data1),'',$data1)
str_replace(preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data2),'',$data2);

it works for data1 but not data2

How to&Answers:

Try this code (online version @ Ideone.com):


echo preg_replace("/[\x{4e00}-\x{9fa5}]+/u", '', $data1), "\n";
echo preg_replace("/[\x{4e00}-\x{9fa5}]+/u", '', $data2);

// Better use this (credits to Kobi's answer below)
preg_replace("/\p{Han}+/u", '', $data)

I have removed the ^ from the regular expression so we don’t need str_replace() anymore.

Your old regexp matched all non-chinese characters thus preg_replace() only left chinese character in the returned string. In order to obtain the final result, you had to replace the found chinese characters by an empty string.

preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data1) // returns 疯狂的管道
str_replace('疯狂的管道', '', $data1); // gives us Test

The second regexp again matched all non-chinese characters. But now, they are not in a sequence!

preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data2) // returns 睡眠帮手背景乐

And this string cannot be found in $data2 anymore thus it doesn’t work.


You can use a Unicode character property (Han should work for you):

preg_replace("/\p{Han}+/u", '', $data)

Working example: http://ideone.com/uEiIV5


This one should also do the job