4đź‘Ť
Celery isn’t really the right thing to use because it’s designed to persist, but the goal should be reasonably easy to achieve.
Architecturally, you probably want to run a script on a Fargate task. The script chews through the queue and then dies. You’d trigger that task somehow:
- An API call from your data receiver (e.g. Django)
- A lambda function (triggered by what?)
Still some open questions… do you limit yourself to one task at a time or do you need to manage concurrent requests to the queue? Do you retry? But a plausible place to start.
A not-recommended but perhaps easier way to do it would be to run a celery worker in your Django container (e.g. using supervisor) and use Fargate’s autoscaling features. You’d always have the one Django container running to receive data. If the celery worker on that container used up all of the available resources, Fargate would scale the service by adding tasks. Once the jobs were done, it’d remove the excess containers. You’d be paying the “overhead” for Django in each container, but it could cost you less than an always-on celery container and would certainly be simpler — leverage your celery experience and avoid the extra layer of event handling.
EDIT: Another disadvantage of this version is that you need to run Redis somewhere and I’ve found the minimum cost for this to be relatively high.
Based on my growing AWS experience, here’s what you probably should do…
- Use AWS API Gateway as an always-on receiver of events/requests. You only pay for requests, the free tier includes a million per month, and the next 300M are $1 (pricing) so this is likely to be free.
- While you have many options for responding to the request, an AWS Lambda function (which can be written in python) should have the least overhead.
- If your queue will run longer than a Lambda function allows (15 minutes), you’ll need to have that Lambda function delegate the processing to e.g. a Fargate task.
- (Optional) If you want to user a Dockerhub container for your Fargate task, we experienced a bunch of issues with Tasks and Services failing to start due to rate limits at Dockerhub. We ended up wrapping our Fargate task in a Step Function that checked for this error specifically and retried.
- (Optional) If you need to limit concurrency, this SO answer suggests having your Lambda function check for an existing execution (of a Step Function or Fargate task). I was hoping there was something native on Fargate Tasks or Step Functions but I don’t see anything.
I imagine this would represent a huge operating cost savings over the always-on Fargate task and Elasticache Redis queue, but the up-front cost/hassle could exceed the savings.
0đź‘Ť
Have you thought of using AWS Lambda instead of the celery worker? You would then pay per task execution, where cost is driven by execution time and memory usage. If you have an application which is mostly idle then paying per request, skipping the idle cost, would make the most sense.
- [Django]-'Manager' object has no attribute 'create_user' in django 1.7
- [Django]-Django .latest() values
- [Django]-Using Django, I want to find all of the published events after today, but only from the nearest month containing events
- [Django]-Using classes for Django views, is it Pythonic?