6
You can use CSS selector tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title
:
data = '''<tr>
<td class="info"><div class="title">THIS YOU WANT ...</div></td>
</tr>
<tr class="ls">
<td colspan="3">Less similar results</td>
</tr>
<tr>
<td class="info"><div class="title">THIS YOU DON'T WANT ...</div></td>
</tr>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
print(soup.select('tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title'))
Prints:
[<div class="title">THIS YOU WANT ...</div>]
What does it mean?
tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title
Select <div>
with class title
, that is under <tr>
which comes before <tr>
that contains <td>
with "Less similar results"
.
Further reading:
2
We can go the other way around, and focus on the <tr class="ls">
first:
from bs4.element import Tag
ls = soup.find('tr', class_='ls')
elts = [td for tr in ls.previous_siblings
if isinstance(tr, Tag)
for td in tr.find_all('td', class_='info')]
This gives us:
>>> elts
[<td class="info"><div class="title">...</div></td>]
We thus first locate the tr
with a class="ls"
, and then we iterate over its previous siblings and look for <td class="info">
s.
- [Django]-ImportError: No module named context_processors
- [Django]-Error in db with modeltranslation in django
1
try this โ
o = []
for td in soup.find("td", class_="info"):
if td.get_text() == 'Less similar results':
break
for div in td.findChildren("div", class_='title'):
o.append(div.get_text())
print(o)
Source:stackexchange.com