I’m trying to remove all empty
<p> tags CKEditor is inserting in to a description box but they all seem to vary. The possibilities seem to be:
<p></p> <p>(WHITESPACE)</p> <p> </p> <p><br /></p> <p>(NEWLINE) </p> <p>(NEWLINE)<br /><br />(NEWLINE) </p>
With these possibilities, there could be any amount of whitespace,
<br /> tags in between the paragraphs, and there could be some of each kind in one paragraph.
I’m also not sure about the
<br /> tag, from what I’ve seen it could be
I’ve searched SO for a similar answer but of all the answers I’ve seen they all seem to cater for just one of these cases, not all at once. I guess in simple terms what I’m asking is, Is there a regular expression I can use to remove all
<p> tags from some HTML that don’t have any alphanumeric text or symbols/punctuation in them?
Well, in conflict with my suggestion not to parse HTML with regexes, I wrote up a regex to do just that:
This will match properly for:
<p></p> <p> </p> <!-- ([space]) --> <p> </p> <!-- (That's a [tab] character in there --> <p> </p> <p><br /></p> <p> </p> <p> <br /><br /> </p>
What it does:
# / --> Regex start # <p> --> match the opening <p> tag # ( --> group open. # \s --> match any whitespace character (newline, space, tab) # | --> or # --> match # | --> or # </?\s?br\s?/?> --> match the <br> tag # )* --> group close, match any number of any of the elements in the group # </?p> --> match the closing </p> tag ("/" optional) # / --> regex end.
The selected answer is great, but it doesn’t work if
<p> tag has inline style attributes defined, like
A regex to match this, would be: