Pandas json_normalize only certain columns

Query: pandas json_normalize only certain columns

Pandas provides a function called json_normalize() that allows you to normalize semi-structured JSON data into a flat table format. You can specify the specific columns you want to include in the output using the record_path parameter.

Here’s an example to demonstrate how to use json_normalize() with specific columns:

import pandas as pd
import json

# Sample JSON data
data = [
    {
        "name": "John",
        "age": 30,
        "address": {
            "street": "123 Main St",
            "city": "New York",
            "state": "NY"
        }
    },
    {
        "name": "Jane",
        "age": 25,
        "address": {
            "street": "456 Elm St",
            "city": "San Francisco",
            "state": "CA"
        }
    },
    {
        "name": "Bob",
        "age": 35,
        "address": {
            "street": "789 Oak St",
            "city": "Chicago",
            "state": "IL"
        }
    }
]

# Convert JSON data to a Pandas DataFrame
df = pd.json_normalize(data, record_path=['address'], meta=['name'])

# Filter specific columns
df = df[['name', 'street', 'city']]

# Print the DataFrame
print(df)

In this example, we have a sample JSON data representing people’s information, including their name, age, and address. We want to normalize this data and only include the ‘name’, ‘street’, and ‘city’ columns in the final DataFrame.

The pd.json_normalize() function takes the JSON data and the record_path parameter, which specifies the path to the array of records to normalize (in this case, ‘address’). The meta parameter allows us to include additional columns from the data that are not present in the normalized records (in this case, ‘name’).

We then filter the DataFrame by selecting only the desired columns using the indexing syntax (df[['name', 'street', 'city']]). The filtered DataFrame contains only the specified columns (name, street, and city).

Finally, we print the resulting DataFrame, which will output:

   name         street           city
0  John     123 Main St       New York
1  Jane     456 Elm St  San Francisco
2   Bob     789 Oak St        Chicago

This DataFrame contains only the ‘name’, ‘street’, and ‘city’ columns for each record in the JSON data.

Leave a comment