I’m having some problems using strip_tags PHP function when the string contains ‘less than’ and ‘greater than’ signs. For example:
If I do:
strip_tags("<span>some text <5ml and then >10ml some text </span>");
some text 10ml some text
But, obviously I want to get:
some text <5ml and then >10ml some text
Yes I know that I could use < and >, but I don’t have chance to convert those characters into HTML entities since data is already stored as you can see in my example.
What I’m looking for is a clever way to parse HTML in order to get rid only actual HTML tags.
Since TinyMCE was used for generate that data, I know which actual html tags could be used in any case, so a
strip_tags($string, $black_list) implementation would be more usefull than
As a wacky workaround you could filter non-html brackets with:
$html = preg_replace("# <(?![/a-z]) | (?<=\s)>(?![a-z]) #exi", "htmlentities('$0')", $html);
Apply strip_tags() afterwards. Note how this only works for your specific example and similar cases. It’s a regular expression with some heuristics, not artificial intellegince to discern html tags from unescaped angle brackets with other meaning.
If you want to have “greater than” and “lesser than” signs, you need to escape them:
> is >
< is <
See e.g. this: http://www.w3schools.com/html/html_entities.asp
Instead of strip_tags(), just use htmlspecialchars() instead.