I’m having some problems using strip_tags PHP function when the string contains ‘less than’ and ‘greater than’ signs. For example:

If I do:

strip_tags("<span>some text <5ml and then >10ml some text </span>");

I’ll get:

some text 10ml some text

But, obviously I want to get:

some text <5ml and then >10ml some text

Yes I know that I could use &lt; and &gt;, but I don’t have chance to convert those characters into HTML entities since data is already stored as you can see in my example.

What I’m looking for is a clever way to parse HTML in order to get rid only actual HTML tags.

Since TinyMCE was used for generate that data, I know which actual html tags could be used in any case, so a strip_tags($string, $black_list) implementation would be more usefull than strip_tags($string, $allowable_tags).

Any thoughs?

As a wacky workaround you could filter non-html brackets with:

$html = preg_replace("# <(?![/a-z]) | (?<=\s)>(?![a-z]) #exi", "htmlentities('$0')", $html);

Apply strip_tags() afterwards. Note how this only works for your specific example and similar cases. It’s a regular expression with some heuristics, not artificial intellegince to discern html tags from unescaped angle brackets with other meaning.


If you want to have “greater than” and “lesser than” signs, you need to escape them:

&gt; is >

&lt; is <

See e.g. this: http://www.w3schools.com/html/html_entities.asp


Instead of strip_tags(), just use htmlspecialchars() instead.