Pandas read_csv multiple header rows

Pandas read_csv with Multiple Header Rows

The pandas library in Python provides the read_csv() function to read data from a CSV file into a DataFrame object. By default, it assumes that the first row of the CSV file contains the column names. However, if your CSV file has multiple header rows, you can specify it using the header parameter.

The header parameter accepts an integer or a list of integers that indicate which rows should be considered as headers. Let’s see how to use this parameter with some examples.

Example 1: CSV file with a single header row

Suppose we have a CSV file named “data.csv” with the following contents:

      Year,Month,Value
      2019,January,100
      2020,February,200
      2021,March,150
    

We can read this CSV file into a DataFrame with the first row as the header by using:

      import pandas as pd

      df = pd.read_csv('data.csv')
    

The resulting DataFrame will look like this:

         Year     Month  Value
      0  2019   January    100
      1  2020  February    200
      2  2021     March    150
    

Example 2: CSV file with multiple header rows

Let’s say we have a CSV file named “data.csv” with the following contents:

      Parameter 1,Parameter 2
      Year,Month,Value
      2019,January,100
      2020,February,200
      2021,March,150
    

To read this CSV file into a DataFrame with both header rows, we can use the header parameter as follows:

      df = pd.read_csv('data.csv', header=[0, 1])
    

The resulting DataFrame will look like this:

               Year     Month  Value
      Parameter 1   Parameter 2    NaN
                  Year     Month  Value
      0           2019   January    100
      1           2020  February    200
      2           2021     March    150
    

As you can see, the first header row is stored in the columns’ MultiIndex, and the second header row becomes the actual column names.

Leave a comment