1👍
Finally found out the soulution.. the problem was the default “html.parser”, which was not able to handle.
Use “html5lib” instead for parsing. and get the desired results.
soup=BeautifulSoup(html,"html5lib")
soup.findAll("span")
html5lib parser parses the page exactly the way a browser does.
0👍
Maybe you are trying to scrape a different page, but I didn’t have a problem scraping that site. Here is my code:
url='https://www.betfair.com/sport/football'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
test = soup.find_all('span')
for span in test:
print(span)
This produced a large list of spans including the lines/scores which is what I figured you are interested in:
<span class="ssc-lkh"></span>
<span>Join Now</span>
<span class="new flag-en"></span>
<span class="new flag-en"></span>
<span class="sportIcon-6423"></span>
<span class="sportName">American Football</span>
<span class="sportIcon-3988"></span>
<span class="sportName">Athletics</span>
<span class="sportIcon-61420"></span>
.....
Updated in response to the comment below
Here is some revised code to show that my code does indeed pull in the span
s you need.
url='https://www.betfair.com/sport/football'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
test = soup.find_all('span',attrs={"class":"away-team-name"})
for span in test:
print("away team" + span.text)
Produces:
away team
Marseille
away team
Lazio
away team
Academica
away team
Canada (W)
away team
Arnett Gardens FC
away team
UWI FC
....
Source:stackexchange.com