Python json to parquet

Converting JSON to Parquet using Python

In order to convert JSON data to Parquet format using Python, you can utilize libraries such as Pandas and PyArrow.

Step 1: Installing Dependencies

First, make sure you have the necessary libraries installed:

pip install pandas pyarrow

Step 2: Importing Libraries

Next, import the required libraries:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

Step 3: Loading JSON Data

Read the JSON file into a Pandas DataFrame:

df = pd.read_json('your_json_file.json')

Step 4: Converting to Parquet

Convert the DataFrame to Parquet format:

table = pa.Table.from_pandas(df)
pq.write_table(table, 'output.parquet')

The above code converts the DataFrame to a PyArrow Table object and then writes it to a Parquet file called “output.parquet”.

Example: Converting a JSON file to Parquet

Let’s assume you have a JSON file named “data.json” with the following contents:

{
  "name": "John Doe",
  "age": 25,
  "city": "New York"
}

You can convert this JSON file to Parquet using the following code:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Step 1: Load JSON data
df = pd.read_json('data.json')

# Step 2: Convert to Parquet
table = pa.Table.from_pandas(df)
pq.write_table(table, 'output.parquet')

After running the above code, you will have a Parquet file named “output.parquet” containing the JSON data in Parquet format.

Leave a comment