1👍
✅
The html is broken, the section tags are a mess, I have had success using html5lib for parsing badly broken html with bs4:
In [21]: h = """<section class="videos"
....: <section class="box">
....: <a href="/videos/video.php?v=wshhH0xVL2LP4hFb0liu" class="video-box">
....: <img src="http://hw-static.exampl.net/.jpg" width="222" height="125" alt="">
....: </a>
....: <strong class="title"><a href="/videos/video.php?v=wshhH0xVL2LP4hFb0liu">Teen "Allegedly" </a></strong>
....: <div>
....: <span class="views">11,323</span>
....: <span class="comments"><a href="http://www.example.net/v" data-disqus-identifier="94137">44</a></span>
....: </div>"""
In [22]: from bs4 import BeautifulSoup
In [23]: soup = BeautifulSoup(h, 'html5lib')
In [24]: divs = soup.select_one('section.videos')
In [25]: img = divs.find('img').get('src')
In [26]: text = divs.strong.a.text
In [27]: link = divs.a.get('href')
In [28]: img
Out[28]: u'http://hw-static.exampl.net/.jpg'
In [29]: text
Out[29]: u'Teen "Allegedly" '
In [30]: link
Out[30]: u'/videos/video.php?v=wshhH0xVL2LP4hFb0liu'
The correct html should look something like:
<section class ="videos">
<section class="box">
<a href="/videos/video.php?v=wshhH0xVL2LP4hFb0liu" class="video-box">
<img src="http://hw-static.exampl.net/.jpg" width="222" height="125" alt="">
</a>
<strong class="title"><a href="/videos/video.php?v=wshhH0xVL2LP4hFb0liu">Teen "Allegedly" </a></strong>
</section>
<div>
<span class="views">11,323</span>
<span class="comments"><a href="http://www.example.net/v" data-disqus-identifier="94137">44</a></span>
</div>
</section
Source:stackexchange.com