[Django]-Fast moving average computation with Django ORM

3👍

Assuming you have a single entry per date you can use Django 2.0’s new window expressions to calculate a 90-period moving average in a single query:

from django.db.models import Avg, F, RowRange, Window

items = Model.objects.annotate(
    avg=Window(
        expression=Avg('value'), 
        order_by=F('created_at').asc(), 
        frame=RowRange(start=-90,end=0)
    )
)

See also ValueRange if you would like to frame by specific field values instead, which might come in handy if you have multiple rows for each given day for example.

👤minkwe

1👍

Instead of aggregation, you could use annotation. Consider this as I start for testing, I’m not completely sure about the code below. See also the docs about F() objects

    Model.objects.annotate(
        value_avg=Avg(
            'value',
            filter=Q(
                created_at__date__range=(
                    F('created_at__date')-timezone.timedelta(days=90),
                    F('created_at__date')
                )
            )
        )
    )

your_date_field depends what you

1👍

Another try. This is more performant in that it only uses one query, but fetches all required Model-instances from the DB to do the logic in python instead of the DB level. Still not optimal, but hopefully it does the right thing this time 😉 You have to compare if it really gives a performance improvement in your case.

import numpy as np
instances =  Model.objects.filter(
        created_at__gte=min(date_range)-timezone.timedelta(days=90),
        created_at__lte=max(date_range)
    ).values('created_at', 'value')

instances = list(instances)  # evaluate QuerySet and hit DB only once

output = []
for i in range(len(date_range)):    
    output.append(
        np.mean(np.array([inst.value for inst in instances if \
            inst.created_at >= date_range[i]-timezone.timedelta(days=90) and \
            inst.created_at <  date_range[i]
        ]))
    )

Leave a comment