[Django]-Storing large JSON data in Postgres is infeasible, so what are the alternatives?

5👍

It is hard to answer specifically without knowing the details of your situation, but here are some things you may try:

  1. Use Postgres 12 (stored) generated columns to maintain the fields or smaller JSON blobs that are commonly needed. This adds storage overhead, but frees you from having to maintain this duplication yourself.
  2. Create indexes for any JSON fields you are querying (Postgresql allows you to create indexes for JSON expressions).
  3. Use a composite index, where the first field in the index the field you are querying on, and the second field (/json expression) is that value you wish to retrieve. In this case Postgresql should retrieve the value from the index.
  4. Similar to 1, create a materialised view which extracts the fields you need and allows you to query them quickly. You can add indexes to the materialised view too. This may be a good solution as materialised views can be slow to update, but in your case your data doesn’t update anyway.
  5. Investigate why the toast tables are being slow. I’m not sure what performance you are seeing, but if you really do need to pull back a lot of data then you are going to need fast data access whatever database you choose to go with.

Your mileage may vary with all of the above suggestions, especially as each will depend on your particular use case. (see the questions in my comment)

However, the overall idea is to use the tools that Postgresql provides to make your data quickly accessible. Yes this may involve pulling the data out of its original JSON blob, but this doesn’t need to be done manually. Postgresql provides some great tools for this.

1👍

If you just need to store and read fully this json object without using the json structure in your WHERE query, what about simply storing this data as binary in a bytea column? https://www.postgresql.org/docs/current/datatype-binary.html

Leave a comment