Home » Php » PHP DOMDocument is not rendering Unicode Characters Properly

PHP DOMDocument is not rendering Unicode Characters Properly

Posted by: admin July 12, 2020 Leave a comment

Questions:

I am using CKEditor for letting the user to post comments, user can also put the unicode characters in the comment box.

When I submit the Form and Check the $_POST[“reply”], the unicode characters are shown very well. I have also used header('Content-type:text/html; charset=utf-8'); at the top of the page
But When I process it using PHP DOMDocument, all the characters become unreadable.

$html_unicode = "xyz unicode data";
$html_data = '<body>'.$html_unicode . '</body>';
$dom = new DOMDocument();
$dom->loadHTML($html_data );

$elements = $dom->getElementsByTagName('body');

When I echo

echo $dom->textContent;

The Output becomes

§Ø³ÙبÙÙ ÙÙÚº غرÙب ک٠آÙÛ ÙÛÙ

How Can I get the proper unicode characters back using PHP DOMDocument.

How to&Answers:

This worked for me:

$html_unicode = "xyz unicode data";
$html_data = '<body>'.$html_unicode . '</body>';

$dom = new DOMDocument();
$html_data  = mb_convert_encoding($html_data , 'HTML-ENTITIES', 'UTF-8'); // require mb_string
$dom->loadHTML($html_data);

$elements = $dom->getElementsByTagName('body');

Answer:

Try this 🙂

<?php
    $html_unicode = "xyz unicode data";
    $html_data = '<body>'.$html_unicode . '</body>';
    $dom = new DOMDocument();
    $dom->loadHTML($html_data );

    $elements = $dom->getElementsByTagName('body');
    echo utf8_decode($dom->textContent);
?>

Answer:

Thank God I got the Solution By Just Replacing

$html_data = '<body>'.$html_unicode . '</body>';

with

$html_data = '<head><meta http-equiv="Content-Type" 
content="text/html; charset=utf-8">
</head><body>' . $html_unicode . '</body>';

Answer:

this worked for arabic langauge

<?php
echo "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=Windows-1256\"></head><body>";
$html = file_get_contents("    url    ");
$dom = new DOMDocument();
@$dom->loadHTML($html);
$ExTEXT = $dom->getElementById('tag id');
echo utf8_decode($ExTEXT->textContent);
echo "</body></html>";