[Django]-Getting first image from html using Python/Django

0πŸ‘

βœ…

If I do any more parsing of html I probably will look into one of the libraries suggested. But for now I have solved this by:

   startImgPos = post.find('<img', 0, len(post)) + 4
    if(startImgPos > -1):
        endImgPos = post.find('>', startImgPos, len(post))
        imageTag = post[startImgPos:endImgPos]
        startSrcPos = imageTag.find('src="', 0, len(post)) +5
        endSrcPos = imageTag.find('"', startSrcPos , len(post)) 
        linkTag = imageTag[startSrcPos:endSrcPos]
        r['linktag'] = linkTag

I’ll improve this later, but for now it does the trick. Feel free to suggest any more ideas/improvements to the above code.

πŸ‘€Daniel Ryan

9πŸ‘

You can use BeautifulSoup to do this:

http://www.crummy.com/software/BeautifulSoup/

It’s a XML/HTML parser. So you pass in the raw html, and then you can search it for particular tags/attrs etc.

something like this should work:

tree = BeautifulSoup(raw_html)
img_link = (tree.find('img')[0]).attr['src']

3πŸ‘

This is exactly what I’m looking for. Actually, the real code is like this:

tree = BeautifulSoup(raw_html)
img_link = tree.find_all('img')[0].get('src')

Works great! thanks timmy-omahony

πŸ‘€toledano

Leave a comment