30👍
“What are its potential weaknesses, and things that need to be focused on in order to make it as fast as possible?”
The one thing you might be worried about further down the road is that depending on how you create your models and connect them to one another, you may run into an issue where a single page generates many, many, many queries.
This is especially true if you’re using a model that involves a generic relation.
Let’s say you’re using django-activity-stream to create a list of recent events (similar to Facebook’s News Feed). django-activity-stream basically creates a list of generic relations. For each of these generic relations you’re going to have to run a query to get information about that object. And, since it’s generic (i.e. you’re not writing a custom query for each kind of object), if that object has its own relations that you want to output, you might be looking at something like 40-100 queries for an activity feed with just 20-30 items.
Running 40-100 queries for a single request is not optimal behavior.
The good news is that Django is really just a bunch of classes and functions written in python. Almost anything you write in python can be added into Django, so you can always write your own functions or code to optimize a given request.
Choosing another framework is not going to avoid the problem of scalability; it’s just going to present different difficulties in different areas.
Also, you can look into things like caching in order to speed up responses and prevent server load.
24👍
This question was asked in 2011 and Django has come a long way since then. I’ve previously build a social network with 2 million users on Django and found the process to be quite smooth. Part of getstream.io‘s infrastructure also runs on Django and we’ve been quite happy with it. Here are some tips for getting most out of your Django installation. It wasn’t quite clear from the question but I’ll assume your starting from a completely unoptimized Django installation.
Static files & CDN
Start by hosting your static files on S3 and stick the Cloudfront CDN in front of it. Hosting static files from your Django instance is a terrible idea, please don’t do it.
Database & ORM: Select related
The 2nd most common mistake is not optimizing your usage of the ORM. You’ll want to have a look at the documentation regarding select related and apply it as needed. Most pages on your site should only take 2-3 queries and not N queries as you’ll typically see if you don’t use select related correctly:
https://docs.djangoproject.com/en/1.11/ref/models/querysets/
Database: PGBouncer
Creating a new connection to your postgres database is a rather heavy operation. You’ll want to run PGBouncer on localhost to ensure you don’t have any unneeded overhead when creating database connections. This was more urgent with older versions of Django, but in general is still a good idea.
Basic Monitoring & Debugging
Next you’ll want to get some basic monitoring and debugging up and running. The django debug toolbar is your first friend:
https://github.com/jazzband/django-debug-toolbar
After that you’ll want to have a look at tools such as NewRelic, Datadog, Sentry and StatsD/Graphite to get you more insights.
Separate concerns
Another first step is separating out concerns. You’ll want to run your database on its own server, your search server on it’s own server, web on their own servers etc. If you run everything on one machine it’s hard to see what’s causing your app to break. Servers are cheap, split stuff up.
Load balancer
If you’ve never used a load balancer before, start here:
https://aws.amazon.com/elasticloadbalancing/
Use the right tools
If you’re doing tag clouds, tag search or search use a dedicated tool such as Elastic for this.
If you have a counter that is frequently changing or a list that is rapidly changing use Redis instead of your database to cache the latest version
Celery and RabbitMQ
Use a task queue to do anything that doesn’t need to be done right now in the background. The most widely used task queue is Celery:
http://www.celeryproject.org/
Denormalize everything
You don’t want to compute counts such as likes and comments on reads. Simple update the like and comment count every time someone adds a new like or comment. This makes the write operation heavier, but the read lighter. Since you’ll probably have a lot of reads and very few writes, that’s exactly what you want.
News feeds and activity streams
If you’re building feeds have a look at this service for building news feeds & activity streams or the open source Stream-Framework
In 2011 you had to build your own feed technology, nowadays this is no longer the case. Build a social network with PHP
Now that we’ve gone over the basics lets review some more advanced tips.
CDN and 2 stage loading
You are already using Cloudfront for your static files. As a next step you’ll want to stick Cloudfront in front of your web traffic as well. This allows you to cache certain pages on the CDN and reduce the load on your servers.
You can even cache pages for logged in users on the CDN. Simply use Javascript to load in all the page customizations and user specific details after the page is served from the CDN.
Database: PGBadger
Tools such as PGBadger give you great insights into what your database is actually doing. You’ll want to run daily reports on part of your log data.
Database: Indexes
You’ll want to start reading up on database indexes. Most early scaling problems can be fixed by applying the right index and optimizing your database a little bit. If you get your indexes right you’ll be doing better than most people. There is a lot more room for database optimization and these books by the 2nd quadrant folks are awesome. https://www.2ndquadrant.com/en/books/
Database: Tuning
If you’re not using RDS you’ll want to run a quick PGTune check on your database. By default postgres’ configuration is pretty sluggish, PGTune tells you the right settings to use:
https://github.com/gregs1104/pgtune
Cache everything
Scaling your database is a pain. Eventually you’ll get around to having multiple slave databases, handling sharding and partitioning etc. Scaling your database is time consuming and your best way to avoid spending tons of time on that is caching. Redis is your go to cache nowadays, but memcached is also a decent option. Basically you’ll want to cache everything. A page shows a list of posts: Read from Redis, Looking up user profiles? Read from Redis. You want to use your database as little as possible and put most of the load on your cache layer since it’s extremely simple to scale your cache layer
Offsets
Postgres doesn’t like large offsets. Use ID filtering when you’re paginating through large result sets.
Deadlocks
With a lot of traffic you’ll eventually get deadlocks. This happens when multiple transactions on postgress try to lock a piece of information and A waits for B while B waits for C and C waits for A. The obvious solution is to use smaller transactions. That reduces the chance for deadlocks to occur. Next, you’ll want to batch updates to your most popular data. IE. Instead of updating counts whenever someone likes a post, you’ll want store a list like changes and sync that to the count every 5 minutes or so.
Those are some of the basic tips, have fun dealing with rapidly growing social networks 🙂
- [Django]-AttributeError: module Django.contrib.auth.views has no attribute
- [Django]-Django Unit Testing taking a very long time to create test database
- [Django]-How to set css class of a label in a django form declaration?
6👍
Pinterest & Instagram use django, i’m sure it’s scaleable, for most loaded parts such as activities feed you can use in-memory storage like Redis.
high-load sites on django
Disqus
http://www.slideshare.net/zeeg/djangocon-2010-scaling-disqus
Pinterest
http://www.slideshare.net/eonarts/mysql-meetup-july2012scalingpinterest
Instagram
http://instagram-engineering.tumblr.com/
- [Django]-Advantages to using URLField over TextField?
- [Django]-Python Django: You're using the staticfiles app without having set the STATIC_ROOT setting
- [Django]-Feedback on using Google App Engine?
5👍
Off my head …
Pinax has a profile for a social networking site.
Convore and Disqus uses Django for some parts of their websites.
About Django scalability – Does Django Scale ?
Edit: Found this while I was googling for something else.
PyCon 2011: Django: Pitfalls I Encountered and How to Avoid Them
Presented by Luke Sneeringer
Are you starting a moderate to large
sized Django project? Do you need to
plan ahead and build an application
that will react to unanticipated
needs? This talk covers some
techniques and pitfalls I encountered
in writing my first reasonably large
Django site, and what I did
differently the second time I started
a project.
- [Django]-Django – Unique list from QuerySet
- [Django]-Format numbers in django templates
- [Django]-Django – convert a list back to a queryset
2👍
Django can certainly be used to build a social network, It offers great features for performance enhancements like caching. See this post on scaling.
The main bottleneck will come with how you design your models. In my experience, creating deep nested foreign links and many joins(manytomany relations) slows up when you are running complex queries. You should try listfields for such cases. You can also investigate the key/value pair Google uses on its big table in appengine, it scales more than relation databases.
You should also page items conviently, you may want to use ajax to still keep the user experience and prevent users from loading pages just to see more posts.
- [Django]-Django: Is there a way to keep the dev server from restarting when a local .py file is changed and dynamically loaded?
- [Django]-Django Templates First element of a List
- [Django]-Mongoengine creation_time attribute in Document
0👍
This question talks about scaling with Django. That may boost your confidence in trying to create a potentially large site.
- [Django]-Django add extra field to a ModelForm generated from a Model
- [Django]-How to add column in ManyToMany Table (Django)
- [Django]-What's the point of Django's collectstatic?
0👍
This is not an issue only on Django or python, it’s a thing of cloud and software engineering. One server alone may be ok for 10,000 users, given they are not concurrent, also location, are these users in the same city? country?
I believe Django is very good and I will use it my self in a similar project, my issue is not Django but the IaaS, the infrastructure where I will run this on.
If you are still worried if Python is the answer then you can research about, Ruby on Rails, and asp .Net, even perl, php, stuff like that. To me, Python is definitely the answer.
- [Django]-Create custom buttons in admin change_form in Django
- [Django]-Supervising virtualenv django app via supervisor
- [Django]-How to display the current year in a Django template?