1👍
Beautiful soup has unwrap()
:
It replaces a tag with whatever’s inside that tag.
You will have to manually iterate over all tags you want to replace.
1👍
You can extend Python’s HTMLParser
and create your own parser to skip specified tags.
Using the example provided in the given link, I will modify it to strip <h1></h1>
tags but keep their data:
from html.parser import HTMLParser
NOT_ALLOWED_TAGS = ['h1']
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag not in NOT_ALLOWED_TAGS:
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
if tag not in NOT_ALLOWED_TAGS:
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>')
That will return:
Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data : Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
# h1 start tag here
Encountered some data : Parse me!
# h1 close tag here
Encountered an end tag : body
Encountered an end tag : html
You can now maintain a NOT_ALLOWED_TAG
list to use for stripping those tags.
- [Answered ]-Django date time filter?
- [Answered ]-Django: assert 'Many-to-many' relation exists in test
Source:stackexchange.com