13π
To answer my own question, as I import data using Pandas into Django quite often nowadays, the mistake I was making was trying to use Pandas built-in Sql Alchemy DB ORM which was modifying the underlying database table definition. In the context above, you can simply use the Django ORM to connect and insert the data:
from myapp.models import Agency
class Command(BaseCommand):
def handle(self, *args, **options):
# Process data with Pandas
agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
# iterate over DataFrame and create your objects
for agency in agencies.itertuples():
agency = Agency.objects.create(name=agency.name)
However, you may often want to import data using an external script rather than using a management command, as above, or using Djangoβs shell. In this case you must first connect to the Django ORM by calling the setup
method:
import os, sys
import django
import pandas as pd
sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"
# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"
# Connect to Django ORM
django.setup()
# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
-
Here I have exported my settings module
myproject.settings
to theDJANGO_SETTINGS_MODULE
so thatdjango.setup()
can pick up the project settings. -
Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.
-
You can modify any settings before calling
setup
. If your script needs to connect to the DB differently than whats configured insettings
. For example, when running a script locally against Django/postgres Docker containers.
Note, the above example was using the django-environ to specify DB settings.
5π
For those looking for a more performant and up-to-date solution, I would suggest using manager.bulk_create
and instantiating the django model instances, but not creating them.
model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]
Agency.objects.bulk_create(model_instances)
Note that bulk_create
does not run signals or custom saves, so if you have custom saving logic or signal hooks for Agency
model, that will not be triggered. Full list of caveats below.
Documentation: https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create
- How can I change the URL to the Django admin interface?
- Confusion in Django admin, static and media files
- Registering Django system checks in AppConfig's ready() method
- Order queryset by alternating value
- Django 2.0 url parameters in get_queryset
0π
For the people who are still looking for the answer, Now pandas.to_sql() method has an additional parameter called index_label. If you set index=True, then you can also set a label for that index column. I set
df.to_sql('YOUR_TABLE',index=True,index_label='id' ,con=engine, if_exists='replace')
I tested this with one of my Django projects and it worked for me.
Check out the docs for more info.
- Using django_filters on graphene queries while not using Relay
- How to make static files works using django docker nginx and postgresql since its not serving them
- 'TemplateDoesNotExist' error with creating Sitemap for Django app