[Django]-Django for social networking

30👍

“What are its potential weaknesses, and things that need to be focused on in order to make it as fast as possible?”

The one thing you might be worried about further down the road is that depending on how you create your models and connect them to one another, you may run into an issue where a single page generates many, many, many queries.

This is especially true if you’re using a model that involves a generic relation.

Let’s say you’re using django-activity-stream to create a list of recent events (similar to Facebook’s News Feed). django-activity-stream basically creates a list of generic relations. For each of these generic relations you’re going to have to run a query to get information about that object. And, since it’s generic (i.e. you’re not writing a custom query for each kind of object), if that object has its own relations that you want to output, you might be looking at something like 40-100 queries for an activity feed with just 20-30 items.

Running 40-100 queries for a single request is not optimal behavior.

The good news is that Django is really just a bunch of classes and functions written in python. Almost anything you write in python can be added into Django, so you can always write your own functions or code to optimize a given request.

Choosing another framework is not going to avoid the problem of scalability; it’s just going to present different difficulties in different areas.

Also, you can look into things like caching in order to speed up responses and prevent server load.

24👍

This question was asked in 2011 and Django has come a long way since then. I’ve previously build a social network with 2 million users on Django and found the process to be quite smooth. Part of getstream.io‘s infrastructure also runs on Django and we’ve been quite happy with it. Here are some tips for getting most out of your Django installation. It wasn’t quite clear from the question but I’ll assume your starting from a completely unoptimized Django installation.

Static files & CDN

Start by hosting your static files on S3 and stick the Cloudfront CDN in front of it. Hosting static files from your Django instance is a terrible idea, please don’t do it.

Database & ORM: Select related

The 2nd most common mistake is not optimizing your usage of the ORM. You’ll want to have a look at the documentation regarding select related and apply it as needed. Most pages on your site should only take 2-3 queries and not N queries as you’ll typically see if you don’t use select related correctly:
https://docs.djangoproject.com/en/1.11/ref/models/querysets/

Database: PGBouncer

Creating a new connection to your postgres database is a rather heavy operation. You’ll want to run PGBouncer on localhost to ensure you don’t have any unneeded overhead when creating database connections. This was more urgent with older versions of Django, but in general is still a good idea.

Basic Monitoring & Debugging

Next you’ll want to get some basic monitoring and debugging up and running. The django debug toolbar is your first friend:
https://github.com/jazzband/django-debug-toolbar

After that you’ll want to have a look at tools such as NewRelic, Datadog, Sentry and StatsD/Graphite to get you more insights.

Separate concerns

Another first step is separating out concerns. You’ll want to run your database on its own server, your search server on it’s own server, web on their own servers etc. If you run everything on one machine it’s hard to see what’s causing your app to break. Servers are cheap, split stuff up.

Load balancer

If you’ve never used a load balancer before, start here:
https://aws.amazon.com/elasticloadbalancing/

Use the right tools

If you’re doing tag clouds, tag search or search use a dedicated tool such as Elastic for this.

If you have a counter that is frequently changing or a list that is rapidly changing use Redis instead of your database to cache the latest version

Celery and RabbitMQ

Use a task queue to do anything that doesn’t need to be done right now in the background. The most widely used task queue is Celery:
http://www.celeryproject.org/

Denormalize everything

You don’t want to compute counts such as likes and comments on reads. Simple update the like and comment count every time someone adds a new like or comment. This makes the write operation heavier, but the read lighter. Since you’ll probably have a lot of reads and very few writes, that’s exactly what you want.

News feeds and activity streams

If you’re building feeds have a look at this service for building news feeds & activity streams or the open source Stream-Framework

In 2011 you had to build your own feed technology, nowadays this is no longer the case. Build a social network with PHP

Now that we’ve gone over the basics lets review some more advanced tips.

CDN and 2 stage loading

You are already using Cloudfront for your static files. As a next step you’ll want to stick Cloudfront in front of your web traffic as well. This allows you to cache certain pages on the CDN and reduce the load on your servers.

You can even cache pages for logged in users on the CDN. Simply use Javascript to load in all the page customizations and user specific details after the page is served from the CDN.

Database: PGBadger

Tools such as PGBadger give you great insights into what your database is actually doing. You’ll want to run daily reports on part of your log data.

Database: Indexes

You’ll want to start reading up on database indexes. Most early scaling problems can be fixed by applying the right index and optimizing your database a little bit. If you get your indexes right you’ll be doing better than most people. There is a lot more room for database optimization and these books by the 2nd quadrant folks are awesome. https://www.2ndquadrant.com/en/books/

Database: Tuning

If you’re not using RDS you’ll want to run a quick PGTune check on your database. By default postgres’ configuration is pretty sluggish, PGTune tells you the right settings to use:
https://github.com/gregs1104/pgtune

Cache everything

Scaling your database is a pain. Eventually you’ll get around to having multiple slave databases, handling sharding and partitioning etc. Scaling your database is time consuming and your best way to avoid spending tons of time on that is caching. Redis is your go to cache nowadays, but memcached is also a decent option. Basically you’ll want to cache everything. A page shows a list of posts: Read from Redis, Looking up user profiles? Read from Redis. You want to use your database as little as possible and put most of the load on your cache layer since it’s extremely simple to scale your cache layer

Offsets

Postgres doesn’t like large offsets. Use ID filtering when you’re paginating through large result sets.

Deadlocks

With a lot of traffic you’ll eventually get deadlocks. This happens when multiple transactions on postgress try to lock a piece of information and A waits for B while B waits for C and C waits for A. The obvious solution is to use smaller transactions. That reduces the chance for deadlocks to occur. Next, you’ll want to batch updates to your most popular data. IE. Instead of updating counts whenever someone likes a post, you’ll want store a list like changes and sync that to the count every 5 minutes or so.

Those are some of the basic tips, have fun dealing with rapidly growing social networks 🙂

6👍

Pinterest & Instagram use django, i’m sure it’s scaleable, for most loaded parts such as activities feed you can use in-memory storage like Redis.

high-load sites on django

Disqus
http://www.slideshare.net/zeeg/djangocon-2010-scaling-disqus

Pinterest
http://www.slideshare.net/eonarts/mysql-meetup-july2012scalingpinterest

Instagram
http://instagram-engineering.tumblr.com/

5👍

Off my head …

Pinax has a profile for a social networking site.

Convore and Disqus uses Django for some parts of their websites.

About Django scalability – Does Django Scale ?

Edit: Found this while I was googling for something else.

PyCon 2011: Django: Pitfalls I Encountered and How to Avoid Them

Presented by Luke Sneeringer

Are you starting a moderate to large
sized Django project? Do you need to
plan ahead and build an application
that will react to unanticipated
needs? This talk covers some
techniques and pitfalls I encountered
in writing my first reasonably large
Django site, and what I did
differently the second time I started
a project.

👤Renyi

2👍

Django can certainly be used to build a social network, It offers great features for performance enhancements like caching. See this post on scaling.

The main bottleneck will come with how you design your models. In my experience, creating deep nested foreign links and many joins(manytomany relations) slows up when you are running complex queries. You should try listfields for such cases. You can also investigate the key/value pair Google uses on its big table in appengine, it scales more than relation databases.

You should also page items conviently, you may want to use ajax to still keep the user experience and prevent users from loading pages just to see more posts.

👤kkd

0👍

This question talks about scaling with Django. That may boost your confidence in trying to create a potentially large site.

👤j_syk

0👍

This is not an issue only on Django or python, it’s a thing of cloud and software engineering. One server alone may be ok for 10,000 users, given they are not concurrent, also location, are these users in the same city? country?

I believe Django is very good and I will use it my self in a similar project, my issue is not Django but the IaaS, the infrastructure where I will run this on.

If you are still worried if Python is the answer then you can research about, Ruby on Rails, and asp .Net, even perl, php, stuff like that. To me, Python is definitely the answer.

Leave a comment