1đź‘Ť
Multipurpose solution:
image_re = re.compile(r"""
(?P<img_tag><img)\s+ #tag starts
[^>]*? #other attributes
src= #start of src attribute
(?P<quote>["''])? #optional open quote
(?P<image>[^"'>]+) #image file name
(?(quote)(?P=quote)) #close quote
[^>]*? #other attributes
> #end of tag
""", re.IGNORECASE|re.VERBOSE) #re.VERBOSE allows to define regex in readable format with comments
image_tags = []
for match in image_re.finditer(content):
image_tags.append(match.group("img_tag"))
#print found image_tags
for image_tag in image_tags:
print image_tag
As you can see in regex definition, it contains
(?P<group_name>regex)
It allows you to access found groups by group_name
, and not by number. It is for readability. So, if you want to show all src
attributes of img
tags, then just write:
for match in image_re.finditer(content):
image_tags.append(match.group("image"))
After this image_tags list will contain src of image tags.
Also, if you need to parse html, then there are instruments that were designed exactly for such purposes. For example it is lxml, that use xpath expressions.
👤stalk
0đź‘Ť
I don’t know Python but assuming it uses normal Perl compatible regular expressions…
You probably want to look for “<img[^>]+>” which is: “<img”, followed by anything that is not “>”, followed by “>”. Each match should give you a complete image tag.
👤Grynn
Source:stackexchange.com