[Django]-PostgreSQL ETL process on Heroku

7👍

One solution may be to do the whole ETL process in Postgres land. That is, use the dblink extension to pull data from the source database into the target database. This may or may not be sufficient, but it’s worth investigating.

You are free to use the filesystem on a heroku dyno, but I don’t think this is a bullet proof solution. The way it works is that you can write to the filesystem just fine, but as soon as that process exits, away goes the data within it. The size of that filesystem is not guaranteed at all, but it is quite large, unless you need multiple hundreds of GBs worth of storage.

Finally, you can speed up some of the process by turning some session level postgres knobs. Instead of listing them here, just read it up on the excellent postgres docs.

EDIT: We now support the Postgres FDW, a better alternative to dblink: http://www.postgresql.org/docs/current/static/postgres-fdw.html

👤hgmnz

Leave a comment