[Answer]-Aggregation and filtering in Django

1👍

You have to define your models first, then you can use Count, Min, Sum, Avg from django.db.models that will let you do:

Table.objects.aggregate(average_val1=Avg('table__val1'))

The result dictionary will have a key called ‘average_val1’. If no such alias were specified, it would be the rather long ‘table__val1__avg’.

More documentation can be found here: Django documentation – Aggregation

NB: You can filter then use the aggregation to execute the operations on a specific set.
Egg:

Table.objects.filter( pk__in=id_list ).aggregate(average_val1=Avg('table__val1'))

or you can determine each time the limit ids [id1, id2] for each n then do this:

Table.objects.filter( pk__lte=n1, pk__gte=n2).aggregate(average_val1=Avg('table__val1'))

The __in, __lte, __gte will make sure to filter only the id set you want, then you can aggregate on that set.

__in: in a list, __lte: Less than or equal and __gte: Greater than or equal.

0👍

ind = 0
v1ofn = 0
v2ofn = 0
for row in tname.objects.all():
    if ind >= n:
        result_list.append([v1ofn/3, v2ofn/3])
        v1ofn = row.val1
        v2ofn = row.val2
        ind = 0
    else:
        v1ofn = v1ofn + row.val1
        v2ofn = v2ofn + row.val2
    ind = ind + 1

assuming the table has multiple of 3 items, if not, do some extra logic after loop to handle the extras.

0👍

I guess Aggregation is the Django way to do this, but suggested example would produce shitload of queries as Behesti said. And my guess is that that django ORM is really not built for number crunching (but i might be wrong here!)

I would maybe go with numpy (if you are really having huge array, i guess you need to do some partitioning):

The good sides using numpy is that its usually quite much faster than ‘standard’ python operations, but bad side is that its extra dependency.

import numpy
raw_array = [ # This is for testing the code, use .values_list( 'val1', 'val2' ) with db
[1   ,   5      ,   6],
[2   ,   6      ,   4],
[3   ,   3      ,   1],
[4   ,   8      ,   4],
[5   ,   2      ,   6],
[6   ,   8      ,   2],
[7   ,   1      ,   1],
]

arr = numpy.array( raw_array )

def sum_up_every_n_items( arr, n ):
   res = numpy.zeros( (numpy.floor( arr.shape[0]/float(n) ), arr.shape[1]) )
   arr = arr[ 0:res.shape[0]*n, : ] # Truncate, take only full N items
   for loop in range(0,n): # Note: this is loop 0,1,2 if n=3 ! We do addition with numpy vectors!
      res = res + arr[ loop::n, : ] # Get every n'th row from starting with offset
   res = res / float(n)
   return res 

res = sum_up_every_n_items( arr, n=3 )
print res

outputs

[[ 2.          4.66666667  3.66666667]
 [ 5.          6.          4.        ]]

0👍

Avoid doing many queries. Pull the data down one time and do the rest in Python;

n = 3 #yourstep
results = Model.objects.filter(query).only('id', 'val1', 'val2')
avgs = {}
for i in xrange(0, len(results), n): # unsure, but count may end up in 2 queries, and you're going to evaluate the queryset anyway
    avg1, avg2 = 0, 0
    for j in xrange(n):
        avg1 += results[i+j].val1/float(n)
        avg2 += results[i+j].val2/float(n)
    avgs[results[i].id] = (avg1, avg2) # Why is id relevant at all here?

Leave a comment