[Fixed]-Scrape span using BeautifulSoup

1👍

Finally found out the soulution.. the problem was the default “html.parser”, which was not able to handle.
Use “html5lib” instead for parsing. and get the desired results.

soup=BeautifulSoup(html,"html5lib")
soup.findAll("span")

html5lib parser parses the page exactly the way a browser does.

0👍

Maybe you are trying to scrape a different page, but I didn’t have a problem scraping that site. Here is my code:

url='https://www.betfair.com/sport/football'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)

test = soup.find_all('span')
for span in test:
    print(span)

This produced a large list of spans including the lines/scores which is what I figured you are interested in:

<span class="ssc-lkh"></span>
<span>Join Now</span>
<span class="new flag-en"></span>
<span class="new flag-en"></span>
<span class="sportIcon-6423"></span>
<span class="sportName">American Football</span>
<span class="sportIcon-3988"></span>
<span class="sportName">Athletics</span>
<span class="sportIcon-61420"></span>
.....

Updated in response to the comment below

Here is some revised code to show that my code does indeed pull in the spans you need.

url='https://www.betfair.com/sport/football'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)

test = soup.find_all('span',attrs={"class":"away-team-name"}) 
for span in test:
    print("away team" + span.text)

Produces:

away team
Marseille

away team
Lazio

away team
Academica

away team
Canada (W)

away team
Arnett Gardens FC

away team
UWI FC
....

Leave a comment