Home » Php » html – Prevent PHP Tidy from converting style tag data to CDATA

html – Prevent PHP Tidy from converting style tag data to CDATA

Posted by: admin July 12, 2020 Leave a comment

Questions:

I am using php tidy to clean a user generated HTML page which contains a style tag :

<style type="text/css">
    body {
        padding-top: 60px;
        padding-bottom: 40px;
    }
</style>

But once I run the Tidy, the style tag data is converted to CData.
My main purpose of using Tidy is to repair the file as well as do proper indentation.

<style type="text/css">
/*<![CDATA[*/
    body {
            padding-top: 60px;
            padding-bottom: 40px;
    }
/*]]>*/
</style>

My Tidy config options are –

$options = array(
    'preserve-entities' => true,
    'hide-comments' => true,
    'tidy-mark' => false,
    'indent' => true,
    'indent-spaces' => 4,
    'new-blocklevel-tags' => 'article,header,footer,section,nav',
    'new-inline-tags' => 'video,audio,canvas,ruby,rt,rp',
    'doctype' => 'omit',
    'sort-attributes' => 'alpha',
    'vertical-space' => false,
    'output-xhtml' => true,
    'wrap' => 180,
    'wrap-attributes' => false,
    'break-before-br' => false,
    'vertical-space' => false,
);

$buffer = tidy_parse_string($buffer, $options, 'utf8');
tidy_clean_repair($buffer);

I tried searching a lot but the PHP Tidy library is not exactly a “well documented” one!
So it came down to removing the CDATA manually after Tidy cleans/repairs the code.

$buffer = str_replace("/*<![CDATA[*/","",$buffer);
$buffer = str_replace("/*]]>*/","",$buffer);

Now the my problem with this approach is that the indentation of the style tag data is still screwed up (not exactly aligned with the rest of the page)

<style type="text/css">
    body {
        padding-top: 60px;
        padding-bottom: 40px;
    }
</style>

So again, how do I prevent TIDY from creating CDATA on the page!

Thanks a lot!

How to&Answers:

Turn off the output-xhtml option. The CDATA wrapping is required for XHTML, as CSS can contain unescaped > characters.

Answer:

The addition of CDATA tags is intended to help browser know they should parse characters like ‘<‘ and ‘&’ as literal characters instead of html syntax. Tidy does not appear to have any documented configuration that would prevent generating them for inline css/javascript. The only option would be moving the css to a separate file. In which case it doesn’t need the CDATA tag.

see http://tidy.sourceforge.net/docs/quickref.html and https://en.wikipedia.org/wiki/CDATA for more information.

Answer:

One way to handle it is to use a link to an external stylesheet.

<link rel="stylesheet" type="text/css" media="screen, print" href="site.css">