[Fixed]-Saving huge amount of data (nearly 20 billion entries) in django postgresql

1👍

anonymous is right about dump files being the best way to load data from/to databases.

If you don’t have access to the database in order to create a dump file, it might be harder, so a python way to make it work would be to bulk_create in batches.

For example:

inserts = []
last = len(entries)
batch_size = 10000

for i, entry in enumerate(entries):  ## or your datasource
    # transform data to django object
    inserts.append(EntryObject(attribute='attributes...'))

    if i % batch_size == 0 or i == last:

        EntryObject.bulk_create(inserts)  # insert batch

        inserts = []  # reset batch 

Then again, it depends on your datasource. Also you might want to look into running them as asynchronous tasks if it needs to be called as part of a Django view.

0👍

That is because your system would have ran out of memory (RAM). Ideal way would be to create the SQL dump file (assuming you are getting 20 billion records from another DB). And then load this SQL file into the new DB. All the data present in dump file will be loaded into PostgreSQL.

For creating the Dump file, do:

 $ pg_dump -U {user-name} {source_db} -f {dumpfilename.sql}

For Restoring from dump file, do:

 $ psql -U {user-name} -d {desintation_db}-f {dumpfilename.sql}

Check: How To Backup and Restore PostgreSQL Database Using pg_dump and psql for detailed information

Leave a comment