Pandas read_html no tables found

Pandas read_html – No Tables Found

The “read_html” function in the Pandas library allows us to extract tables from HTML content. However, there might be cases where no tables are found in the specified HTML. In such situations, the function throws an empty list. Let’s dive into this topic and examine it with examples.

Example 1:

Consider the following HTML content:

        
<html>
  <head>
    <title>Example HTML</title>
  </head>
  <body>
    <div id="content">
        <h1>Hello World</h1>
        <p>This is an example paragraph.</p>
    </div>
  </body>
</html>

If we try to extract tables using “read_html” on this HTML, the function will return an empty list ([]). This is because there are no table elements present in the HTML content.

Example 2:

Let’s consider another HTML content that includes a table:

        
<html>
  <head>
    <title>Example HTML</title>
  </head>
  <body>
    <table>
      <tr>
        <th>Name</th>
        <th>Age</th>
      </tr>
      <tr>
        <td>John</td>
        <td>25</td>
      </tr>
      <tr>
        <td>Jane</td>
        <td>30</td>
      </tr>
    </table>
  </body>
</html>

In this case, if we pass this HTML content to the “read_html” function, it will successfully extract the table and return a list containing a pandas DataFrame. We can then access the DataFrame and perform further operations on it.

Handling No Tables Found:

To handle scenarios where no tables are found, we can check the length of the returned list. If the length is 0, it means no tables were found. We can then handle this situation accordingly by displaying an appropriate message or taking necessary actions based on the requirement.

Example Code:

        
import pandas as pd

# Function to extract tables from HTML and handle no tables found
def extract_tables(html_content):
    tables = pd.read_html(html_content)
    if len(tables) == 0:
        print("No tables found in the HTML content.")
    else:
        for idx, table in enumerate(tables):
            print(f"Table {idx+1}:")
            print(table)
            print("----")

# Example usage of the function
html = '''
    <html>
      <head>
        <title>Example HTML</title>
      </head>
      <body>
        <table>
          <tr>
            <th>Name</th>
            <th>Age</th>
          </tr>
          <tr>
            <td>John</td>
            <td>25</td>
          </tr>
          <tr>
            <td>Jane</td>
            <td>30</td>
          </tr>
        </table>
      </body>
    </html>
    '''

extract_tables(html)

In this example, the “extract_tables” function takes the HTML content as an input. It tries to extract tables from the HTML using “read_html” and then checks the length of the returned list. If no tables are found, it displays a message. Otherwise, it prints each table one by one.

By using a similar approach, you can handle cases where no tables are found in HTML content when using the “read_html” function in Pandas.