Home » Php » php – How to remove chinese characters in a string

php – How to remove chinese characters in a string

Posted by: admin July 12, 2020 Leave a comment

Questions:

is there any easy way to truncate chinese characters i found that regexp but it doesn’t work as expected

<?php
$data1='疯狂的管道Test';
$data2='睡眠帮手-背景乐Test';

echo str_replace(preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data1),'',$data1)
."<br>\n".
str_replace(preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data2),'',$data2);
exit;

it works for data1 but not data2

How to&Answers:

Try this code (online version @ Ideone.com):

<?php
$data1='疯狂的管道Test';
$data2='睡眠帮手-背景乐Test';

echo preg_replace("/[\x{4e00}-\x{9fa5}]+/u", '', $data1), "\n";
echo preg_replace("/[\x{4e00}-\x{9fa5}]+/u", '', $data2);

// Better use this (credits to Kobi's answer below)
preg_replace("/\p{Han}+/u", '', $data)

I have removed the ^ from the regular expression so we don’t need str_replace() anymore.

Your old regexp matched all non-chinese characters thus preg_replace() only left chinese character in the returned string. In order to obtain the final result, you had to replace the found chinese characters by an empty string.

preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data1) // returns 疯狂的管道
str_replace('疯狂的管道', '', $data1); // gives us Test

The second regexp again matched all non-chinese characters. But now, they are not in a sequence!

preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data2) // returns 睡眠帮手背景乐

And this string cannot be found in $data2 anymore thus it doesn’t work.

Answer:

You can use a Unicode character property (Han should work for you):

preg_replace("/\p{Han}+/u", '', $data)

Working example: http://ideone.com/uEiIV5

Answer:

This one should also do the job
/[^\u4E00-\u9FFF]+/