[Django]-How do I use BeautifulSoup to search for elements that occur before another element?

6๐Ÿ‘

โœ…

You can use CSS selector tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title:

data = '''<tr>
    <td class="info"><div class="title">THIS YOU WANT ...</div></td>
</tr>
<tr class="ls">
    <td colspan="3">Less similar results</td>
</tr>
<tr>
    <td class="info"><div class="title">THIS YOU DON'T WANT ...</div></td>
</tr>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

print(soup.select('tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title'))

Prints:

[<div class="title">THIS YOU WANT ...</div>]

What does it mean?

tr:not(tr:has(td:contains("Less similar results")) ~ *) div.title

Select <div> with class title, that is under <tr> which comes before <tr> that contains <td> with "Less similar results".

Further reading:

CSS Selector Reference

๐Ÿ‘คAndrej Kesely

2๐Ÿ‘

We can go the other way around, and focus on the <tr class="ls"> first:

from bs4.element import Tag

ls = soup.find('tr', class_='ls')
elts = [td for tr in ls.previous_siblings
           if isinstance(tr, Tag)
           for td in tr.find_all('td', class_='info')]

This gives us:

>>> elts
[<td class="info"><div class="title">...</div></td>]

We thus first locate the tr with a class="ls", and then we iterate over its previous siblings and look for <td class="info">s.

1๐Ÿ‘

try this โ€“

o = []
for td in soup.find("td", class_="info"):
    if td.get_text() == 'Less similar results':
        break
    for div in td.findChildren("div", class_='title'):
        o.append(div.get_text())

print(o)
๐Ÿ‘คThomas

Leave a comment