Home » Python » Can I remove script tags with BeautifulSoup?

Can I remove script tags with BeautifulSoup?

Posted by: admin November 29, 2017 Leave a comment

Questions:

Can script tags and all of their contents be removed from HTML with BeautifulSoup, or do I have to use Regular Expressions or something else?

Answers:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'lxml')
>>> [s.extract() for s in soup('script')]
>>> soup
baba

Questions:
Answers:

As stated in the (official documentation) you can use the extract method to remove all the subtree that matches the search.

import BeautifulSoup
a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>")
[x.extract() for x in a.findAll('script')]

Questions:
Answers:

Updated answer for those who might need for future reference:
The correct answer is.
decompose()
You can use different ways but decompose works in-place.

Example usage:

soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>')
soup.i.decompose()
print str(soup)
#prints '<p>This is a slimy text and</p>'

Pretty useful to get rid of detritus like ‘script’,’img’ so and so forth.