In my string I have utf-8 non-breaking space (0xc2a0) and I want to replace it with something else.
When I use
$str=preg_replace('~\xc2\xa0~', 'X', $str);
it works OK.
But when I use
$str=preg_replace('~\x{C2A0}~siu', 'W', $str);
non-breaking space is not found (and replaced).
Why? What is wrong with second regexp?
The format \x{C2A0}
is correct, also I used u
flag.
Actually the documentation about escape sequences in PHP is wrong. When you use \xc2\xa0
syntax, it searches for UTF-8 character. But with \x{c2a0}
syntax, it tries to convert the Unicode sequence to UTF-8 encoded character.
A non breaking space is U+00A0
(Unicode) but encoded as C2A0
in UTF-8. So if you try with the pattern ~\x{00a0}~siu
, it will work as expected.
Answer:
I’ve aggegate previous answers so people can just copy / paste following code to choose their favorite method :
$some_text_with_non_breaking_spaces = "some text with 2 non breaking spaces at the beginning";
echo 'Qty non-breaking space : ' . substr_count($some_text_with_non_breaking_spaces, "\xc2\xa0") . '<br>';
echo $some_text_with_non_breaking_spaces . '<br>';
# Method 1 : regular expression
$clean_text = preg_replace('~\x{00a0}~siu', ' ', $some_text_with_non_breaking_spaces);
# Method 2 : convert to bin -> replace -> convert to hex
$clean_text = hex2bin(str_replace('c2a0', '20', bin2hex($some_text_with_non_breaking_spaces)));
# Method 3 : my favorite
$clean_text = str_replace("\xc2\xa0", " ", $some_text_with_non_breaking_spaces);
echo 'Qty non-breaking space : ' . substr_count($clean_text, "\xc2\xa0"). '<br>';
echo $clean_text . '<br>';
Answer:
The two codes do different things in my opinion: the first \xc2\xa0
will replace TWO characters, \xc2
and \xa0
with nothing.
In UTF-8 encoding, this happens to be the codepoint for U+00A0
.
Does \x{00A0}
work? This should be the representation for \xc2\xa0
.
Answer:
I did not work this variant ~\x{c2a0}~siu
.
Varian \x{00A0}
works. I have not tried the second option and here is the result:
I tried to convert it to hex and replace no-break space 0xC2 0xA0 (c2a0)
to space 0x20 (20)
.
Code:
$hex = bin2hex($item);
$_item = str_replace('c2a0', '20', $hex);
$item = hex2bin($_item);
Answer:
/\x{00A0}/, /\xC2\xA0/ and $clean_hex2bin-str_replace-bin2hex worked and didn’t work. If I printed it out to the screen, it’s all good, but if I tried to save it to a file, the file would be blank!
I ended up using iconv(‘UTF-8’, ‘ISO-8859-1//IGNORE’, $str);