Home » Php » php – How get each character from a word with special encoding

php – How get each character from a word with special encoding

Posted by: admin July 12, 2020 Leave a comment

Questions:

I need to get an array with all the characters from a word, but the word has letters with special encoding like á, when I execute the follow code:

$word = 'withá';

$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
    $word_arr[] = $word[$i];
}

or

$word_arr = str_split($word);

I get:

array(6) { [0]=> string(1) “w” [1]=> string(1) “i” [2]=> string(1) “t”
[3]=> string(1) “h” [4]=> string(1) “Ô [5]=> string(1) “¡” }

How can I do to obtain each character as follow?

array(5) { [0]=> string(1) “w” [1]=> string(1) “i” [2]=> string(1) “t”
[3]=> string(1) “h” [4]=> string(1) “á” }

How to&Answers:

Because it is a UTF-8 string, just do

$word = 'withá';
$word = utf8_decode($word);
$word_arr = array();
for ($i=0;$i<strlen($word);$i++) {
    $word_arr[] = $word[$i];
}

The reason for this is that, even though it looks right in your script, the interpreter converts it into a multibyte character (why mb_split() works as well). To convert it to proper UTF-8 format, you can use the mb functions or just specify utf8_decode().

Answer:

I think mb_split will do it for you: http://www.php.net/manual/en/function.mb-split.php

If you’re using special encodings, you probably want to read up on how PHP handles multibyte encoding in general…

EDIT: Nope, can’t figure out how to make mb_split do it myself, but looking around SO got some other questions that were answered with preg_split. I tested this and it seems to do exactly what you want:

preg_split('//',$word,-1,PREG_SPLIT_NO_EMPTY);

I’d still strongly suggest you read up on multibyte characters in PHP though. It’s kind of a mess, IMHO.

Here’s some good links:
http://www.joelonsoftware.com/articles/Unicode.html
and
http://akrabat.com/php/utf8-php-and-mysql/
and plenty more can be found…

Answer:

as found on: http://www.php.net/manual/en/function.str-split.php#107658

    function str_split_unicode($str, $l = 0) {
        if ($l > 0) {
            $ret = array();
            $len = mb_strlen($str, "UTF-8");
            for ($i = 0; $i < $len; $i += $l) {
                $ret[] = mb_substr($str, $i, $l, "UTF-8");
            }
            return $ret;
        }
        return preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY);
    }

   $word = 'withá';
   $word = str_split_unicode($word);
   var_dump($word);

Answer:

you should use the multibyte-Functions for all Multibyte Charsets! I guess mb_split is the pendant:

http://php.net/manual/en/function.mb-split.php