Query: pandas json_normalize only certain columns
Pandas provides a function called json_normalize()
that allows you to normalize semi-structured JSON data into a flat table format. You can specify the specific columns you want to include in the output using the record_path
parameter.
Here’s an example to demonstrate how to use json_normalize()
with specific columns:
import pandas as pd
import json
# Sample JSON data
data = [
{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY"
}
},
{
"name": "Jane",
"age": 25,
"address": {
"street": "456 Elm St",
"city": "San Francisco",
"state": "CA"
}
},
{
"name": "Bob",
"age": 35,
"address": {
"street": "789 Oak St",
"city": "Chicago",
"state": "IL"
}
}
]
# Convert JSON data to a Pandas DataFrame
df = pd.json_normalize(data, record_path=['address'], meta=['name'])
# Filter specific columns
df = df[['name', 'street', 'city']]
# Print the DataFrame
print(df)
In this example, we have a sample JSON data representing people’s information, including their name, age, and address. We want to normalize this data and only include the ‘name’, ‘street’, and ‘city’ columns in the final DataFrame.
The pd.json_normalize()
function takes the JSON data and the record_path
parameter, which specifies the path to the array of records to normalize (in this case, ‘address’). The meta
parameter allows us to include additional columns from the data that are not present in the normalized records (in this case, ‘name’).
We then filter the DataFrame by selecting only the desired columns using the indexing syntax (df[['name', 'street', 'city']]
). The filtered DataFrame contains only the specified columns (name, street, and city).
Finally, we print the resulting DataFrame, which will output:
name street city
0 John 123 Main St New York
1 Jane 456 Elm St San Francisco
2 Bob 789 Oak St Chicago
This DataFrame contains only the ‘name’, ‘street’, and ‘city’ columns for each record in the JSON data.