I have the following address line: Praha 5, Staré Město,
I need to use utf8_decode() function on this string before I can write it to a PDF file (using domPDF lib).
However, the php utf8 decode function for the above address line appears incorrect (or rather, incomplete).
The following code:
<?php echo utf8_decode('Praha 5, Staré Město,'); ?>
Produces this:
Praha 5, Staré M?sto,
Any idea why ě is not getting decoded?
utf8_decode
converts the string from a UTF-8 encoding to ISO-8859-1, a.k.a. “Latin-1”.
The Latin-1 encoding cannot represent the letter “ě”. It’s that simple.
“Decode” is a total misnomer, it does the same as iconv('UTF-8', 'ISO-8859-1', $string)
.
Answer:
Problem is in your PHP file encoding , save your file in UTF-8
encoding , then even no need to use utf8_decode
, if you get these data 'Praha 5, Staré Město,'
from database , better change it charset to UTF-8
Answer:
you don’t need that (@Rajeev :this string is automatically detected as utf-8 encoded :
echo mb_detect_encoding('Praha 5, Staré Město,');
will always return UTF-8.).
You’d rather see :
https://code.google.com/p/dompdf/wiki/CPDFUnicode
Answer:
I wound up using a home-grown UTF-8 / UTF-16 decoding function (convert to &#number; representations), I haven’t found any pattern to why UTF-8 isn’t detected, I suspect it’s because the “encoded-as” sequence isn’t always exactly in the same position in the string returned. You might do some additional checking on that.
Three-character UTF-8 indicator: $startutf8 = chr(0xEF).chr(187).chr(191); (if you see this ANYWHERE, not just first three characters, the string is UTF-8 encoded)
Decode according to UTF-8 rules; this replaced an earlier version which chugged through byte by byte:using
function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (0) in ereg ranges. RH73 does not like that */
if (! ereg("[0-7]", $string) and ! ereg("[1-7]", $string))
return $string;
// decode three byte unicode characters
$string = preg_replace("/([0-7])([0-7])([0-7])/e",
"'&#'.((ord('\1')-224)*4096 + (ord('\2')-128)*64 + (ord('\3')-128)).';'",
$string);
// decode two byte unicode characters
$string = preg_replace("/([0-7])([0-7])/e",
"'&#'.((ord('\1')-192)*64+(ord('\2')-128)).';'",
$string);
return $string;
}
Tags: phpphp