Home » Php » character encoding – PHP generated XML shows invalid Char value 27 message

character encoding – PHP generated XML shows invalid Char value 27 message

Posted by: admin April 23, 2020 Leave a comment

Questions:

I am generating XML using PHP library as below:

$dom = new DOMDocument("1.0","utf-8");

Doing above results in a page which shows a message on top of the output.

This page contains the following errors:
error on line 16 at column 274505: PCDATA invalid Char value 27
Below is a rendering of the page up to the first error.

I have tried rectifying using Tidy library.. used iconv to get the chinese character in UTF-8.

How to&Answers:

A useful function to get rid of that error is suggested on this website.
http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8

When you put utf-8 encoded strings in a XML document you should remember that not all utf-8 valid chars are accepted in a XML document http://www.w3.org/TR/REC-xml/#charsets

So you should strip away the unwanted chars, else you’ll have an XML fatal parsing error such as above

function utf8_for_xml($string)
{
    return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string);
}

Hope that saves someone else some time..

Answer:

Prashant is absolutely right. You can also strip away invalid characters in Javascript by doing:

function utf8_for_xml(inputStr) {
  return inputStr.replace(/[^\x09\x0A\x0D\x20-\xFF\x85\xA0-\uD7FF\uE000-\uFDCF\uFDE0-\uFFFD]/gm, '');
}