[Answer]-Django queryset – filter/exclude by sum over column

1👍

Each object in the following queryset will have a ‘filesize_sum’ attribute holding the summary of filesizes of all cache resources created since that object’s creation time.

qs = CachedResource.objects.order_by('-created').extra(select={
  'filesize_sum': """
   SELECT
     SUM(filesize)
   FROM
     CachedResource_table_name as cr
   WHERE
     cr.created >= CachedResource_table_name.created
     """})

Then you can make a loop to do what you want. For example, you could make a loop that breaks on the first object with filesize_sum > 200MB and run a delete query on the queryset for all objects with a smaller or equal creation date to that object:

for obj in qs:
   if obj.filesize_sum > 200:
       qs.filter(created__lte=obj.created).delete()
       break

Keep in mind though that you probably want to also take some action before inserting a new cache resource, so that the filesize of the new resource does not exceed your limit. For example, you could run the above procedure with:

limit = configured_limit - filesize_of_cache_resource_to_insert

0👍

Probably there is a better way to do that:

cachedResources = CachedResource.objects.order_by('-created')
list_of_items = []
size_of_files = 0
for item in cachedResources:
    if size_of_files < 200:
        list_of_items.append(item.id)
    else
        break
cached_resources_by_size = CachedResource.objects.filter(id__in=list_of_items).order_by('-created')

0👍

totals = CachedResource.objects.values('id').aggregate(sum=Sum('filesize'), count=Count('id'))
num_to_keep = totals['count'] * min(MAX_FILESIZE / totals['sum'], 1)
while num_to_keep < totals['count']:
    new_sum = CachedResource.objects.filter(id__in=CachedResource.objects.order_by('-created')[:num_to_keep]).aggregate(sum=Sum('filesize'))
    # if <not acceptable approximation>:
    #    adjust approximation
    #    continue

    CachedResource.objects.order_by('-created')[num_to_keep:].delete()
    break

The aggregation in line 1 can get you the total filesize and the number of entries in a single query. Based on these results, it’s easy to calculate an approximated amount of entries to keep. You can do some extra checking to assert that this approximation falls within certain limits (+/- 20% as you said). Then a simple order_by and a slice will result in a queryset of all entries to delete.

👤knbk

Leave a comment