152
import pandas as pd
import datetime
from myapp.models import BlogPost
df = pd.DataFrame(list(BlogPost.objects.all().values()))
df = pd.DataFrame(
list(
BlogPost.objects.filter(
date__gte=datetime.datetime(2012, 5, 1)
).values()
)
)
# limit which fields
df = pd.DataFrame(
list(
BlogPost.objects.all().values(
"author", "date", "slug"
)
)
)
The above is how I do the same thing. The most useful addition is specifying which fields you are interested in. If it’s only a subset of the available fields you are interested in, then this would give a performance boost I imagine.
40
Convert the queryset on values_list()
will be more memory efficient than on values()
directly. Since the method values()
returns a queryset of list of dict (key:value pairs), values_list()
only returns list of tuple (pure data). It will save about 50% memory, just need to set the column information when you call pd.DataFrame()
.
Method 1:
queryset = models.xxx.objects.values("A", "B", "C", "D")
## consumes much memory
df = pd.DataFrame(list(queryset))
## works, but no much change on memory usage
df = pd.DataFrame.from_records(queryset)
Method 2:
queryset = models.xxx.objects.values_list(
"A", "B", "C", "D"
)
## this will save 50% memory
df = pd.DataFrame(
list(queryset), columns=["A", "B", "C", "D"]
)
## It does not work. Crashed with datatype is queryset not list.
df = pd.DataFrame.from_records(
queryset, columns=["A", "B", "C", "D"]
)
I tested this on my project with >1 million rows data, the peak memory is reduced from 2G to 1G.
- [Django]-Django-celery: No result backend configured
- [Django]-Django: guidelines for speeding up template rendering performance
- [Django]-How can I get all the request headers in Django?
32
Django Pandas solves this rather neatly: https://github.com/chrisdev/django-pandas/
From the README:
class MyModel(models.Model):
full_name = models.CharField(max_length=25)
age = models.IntegerField()
department = models.CharField(max_length=3)
wage = models.FloatField()
from django_pandas.io import read_frame
qs = MyModel.objects.all()
df = read_frame(qs)
- [Django]-Django: list all reverse relations of a model
- [Django]-Homepage login form Django
- [Django]-How do Django models work?
2
From the Django perspective (I’m not familiar with pandas
) this is fine. My only concern is that if you have a very large number of records, you may run into memory problems. If this were the case, something along the lines of this memory efficient queryset iterator would be necessary. (The snippet as written might require some rewriting to allow for your smart use of .values()
).
- [Django]-'pip' is not recognized as an internal or external command
- [Django]-Django: Record with max element
- [Django]-What is the purpose of apps.py in Django 1.9?
2
You maybe can use model_to_dict
import datetime
from django.forms import model_to_dict
pallobjs = [ model_to_dict(pallobj) for pallobj in PalletsManag.objects.filter(estado='APTO_PARA_VENTA')]
df = pd.DataFrame(pallobjs)
df.head()
- [Django]-Chaining multiple filter() in Django, is this a bug?
- [Django]-Django: Get list of model fields?
- [Django]-Why am I getting this error in Django?