Converting JSON to Parquet using Python
In order to convert JSON data to Parquet format using Python, you can utilize libraries such as Pandas and PyArrow.
Step 1: Installing Dependencies
First, make sure you have the necessary libraries installed:
pip install pandas pyarrow
Step 2: Importing Libraries
Next, import the required libraries:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
Step 3: Loading JSON Data
Read the JSON file into a Pandas DataFrame:
df = pd.read_json('your_json_file.json')
Step 4: Converting to Parquet
Convert the DataFrame to Parquet format:
table = pa.Table.from_pandas(df)
pq.write_table(table, 'output.parquet')
The above code converts the DataFrame to a PyArrow Table object and then writes it to a Parquet file called “output.parquet”.
Example: Converting a JSON file to Parquet
Let’s assume you have a JSON file named “data.json” with the following contents:
{
"name": "John Doe",
"age": 25,
"city": "New York"
}
You can convert this JSON file to Parquet using the following code:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
# Step 1: Load JSON data
df = pd.read_json('data.json')
# Step 2: Convert to Parquet
table = pa.Table.from_pandas(df)
pq.write_table(table, 'output.parquet')
After running the above code, you will have a Parquet file named “output.parquet” containing the JSON data in Parquet format.
- Pyarrow.lib.arrowinvalid: file is too small to be a well-formed file
- Projection type must be an interface
- Property ‘subscribe’ does not exist on type ‘void’.
- Print instance of object flutter
- Python dataclass copy
- Pub cache repair flutter
- Pyspark read from azure blob storage
- Psycopg2 vs sqlalchemy
- Python is loading libcrypto in an unsafe way