Pandas read_csv with Multiple Header Rows
The pandas
library in Python provides the read_csv()
function to read data from a CSV file into a DataFrame object. By default, it assumes that the first row of the CSV file contains the column names. However, if your CSV file has multiple header rows, you can specify it using the header
parameter.
The header
parameter accepts an integer or a list of integers that indicate which rows should be considered as headers. Let’s see how to use this parameter with some examples.
Example 1: CSV file with a single header row
Suppose we have a CSV file named “data.csv” with the following contents:
Year,Month,Value 2019,January,100 2020,February,200 2021,March,150
We can read this CSV file into a DataFrame with the first row as the header by using:
import pandas as pd df = pd.read_csv('data.csv')
The resulting DataFrame will look like this:
Year Month Value 0 2019 January 100 1 2020 February 200 2 2021 March 150
Example 2: CSV file with multiple header rows
Let’s say we have a CSV file named “data.csv” with the following contents:
Parameter 1,Parameter 2 Year,Month,Value 2019,January,100 2020,February,200 2021,March,150
To read this CSV file into a DataFrame with both header rows, we can use the header
parameter as follows:
df = pd.read_csv('data.csv', header=[0, 1])
The resulting DataFrame will look like this:
Year Month Value Parameter 1 Parameter 2 NaN Year Month Value 0 2019 January 100 1 2020 February 200 2 2021 March 150
As you can see, the first header row is stored in the columns’ MultiIndex, and the second header row becomes the actual column names.
- Package_config.json does not exist.
- Pandas read excel formula as nan
- Pandas plot smooth line
- Python request enable javascript and cookies to continue
- Psql: error: connection to server on socket “/var/run/postgresql/.s.pgsql.5432” failed: fatal: sorry, too many clients already
- Property ‘http://javax.xml.xmlconstants/property/accessexternaldtd’ is not recognized.
- Pandas read_csv all columns as string