2đź‘Ť
If you need consecutive numbering without holes you should not use Django’s autogenerated id
field as your order number.
In order to guarantee uniqueness even under concurrency Django creates a database sequence which is an object in the database that produces a new value each time it is consulted. Note that the sequence consumes the value produced even if it is not saved to the database anywhere.
What happens then is that whenever you try to create an instance and this operation fails at the database level, a number from the sequence is consumed anyway. So let’s say you create your first Order successfully, it will have the ID number 1. Then let’s say that you try to create a second Order, but the INSERT
in the database fails (for example for some integrity check, or whatever). Afterwards you successfully create a third Order, you would expect that this order has the ID number 2, but it will actually have ID number 3, because the number 2 was consumed from the sequence even if it was not saved.
So no, you cannot use the id
if you need to ensure there are no holes in your order numbers.
Now in order to have consecutive numeration you could simply add a column
order_number = models.PositiveIntegerField(unique=True, null=True)
question is how to properly set its value. So in an ideal world where there is no concurrency (two processes running queries against the same database) you could simply get the maximum order number so far, add 1 and then save this value into order_number
. Thing is if you do this naively you will end up having duplicates (actually integrity errors, because unique=True
will prevent duplicates).
One way to solve this would be to lock your table (see this SO question) while you compute and update your order number.
As I assume you don’t care that the order number faithfully reflects the order in which orders where created but only that it is sequential and without holes what you can do is to run a query like the following inside a transaction (assuming your Order
model lives inside an orders
django app):
UPDATE orders_order SET order_number = (SELECT COALESCE(MAX(order_number), 0) FROM orders_order) + 1 WHERE id = [yourid] AND order_number IS NULL
Now even with this query you could have concurrency issues, since Django uses postgres default isolation level by default. So in order to make this query safe you will need to change isolation level. Refer to this SO question for a way on having two separate connections with two different isolation levels. What you need to make this query safe is to set the isolation level to SERIALIZABLE
.
Assuming you were able to solve the isolation level issue then is the thing on how to run this query
from django.db import connections, transaction
with transaction.atomic(using='your_isolated_db_alias'):
with connections['your_isolated_db_alias'].cursor() as cursor:
cursor.execute('UPDATE orders_order SET order_number = (SELECT COALESCE(MAX(order_number), 0) FROM orders_order) + 1 WHERE id = %s AND order_number IS NULL', order.id)
The snippet above assumes you have the order for which you want to set the order number in a variable called order
. If your isolation is right then you should be safe.
Now there is a third alternative which is to use select_for_update()
as a table locking mechanism (although it is not intended for that but for row level locking). So the idea is simple, in the same way as before you first create your order and then update it to set the order number. So in order to guarantee that you won’t end up with duplicate (aka IntegrityError) order numbers what you do is issue a query that selects all the Orders in your DB and then use select_for_update()
in the following way:
from django.db import transaction
with transaction.atomic():
# This locks every row in orders_order until the end of the transaction
Order.objects.all().select_for_update() # pointless query just to lock the table
max_on = Order.objects.aggregate(max_on=Max('order_number'))['max_on']
Order.objects.filter(id=order.id).update(order_number=max_on + 1)
As long as you are sure that you have at least 1 order before entering the code block above AND that you always do the full select_for_update()
first, then you should also be safe.
And these are the ways I can think of how to solve the consecutive numbering. I’d love to see an out of the box solution for this, but unfortunately I do not know any.
2đź‘Ť
This will not answer your question directly, but still might be useful for you or somebody with a similar problem.
From the data integrity point of view, deleting potentially useful data such as customer order in production can be a really bad idea. Even if you don’t need this data at the moment, you may come to a point in future when you want to analyze all of your orders, even failed / cancelled ones.
What I would suggest here, is to ensure that deleting not so important related models doesn’t cause deleting orders. You can easily achieve this by passing PROTECT argument to your ForeignKey field. This will raise ProtectedError when trying to delete related model. Another useful options are SET_NULL and SET_DEFAULT whose names speak for themselves.
By following this approach, you will never need to worry about the broken id counter.
2đź‘Ť
Let’s leave Django, Python.
That is DB topic. Say – you start transaction, with new row in particular table. That means new ID. If you commit that amount of work – new ID is visible. If rollback happens ID is lost. From DB perspective there is no way to reuse that number.
Be aware that select max(id) + 1
is bad practice – what if two transactions do that at the same time?
Other option is lock. I can see 3 solutions here:
- Lock all rows in the table – that means – your insert time depends on table size 🙂
As a side note. If you go one by one to lock, be sure to sort all rows in the table to be sure there is no deadlock. Say you use Postgres, edit means row can be moved at the end… so order depends on what is going on with the data. If so two transactions can lock rows in different order, and deadlock is a matter of time. During tests, under low load – everything goes just fine…
-
Lock whole table. Better, since not depends on rows, but you block against edits as well.
-
Separate table for generators – each generator has row in that table – you lock that row, take next value, at the end of transaction row is released.
To all points. That means – you need short transactions. In web apps that is general rule. Just be sure create order is light, and most heavy things are performed as separate transaction. Why? Lock is released at the end of transaction.
Hope it explains the case.
In Django. Let’s create model:
class Custom_seq(models.Model):
name = models.CharField(max_length=100, blank=False, null=False)
last_number = models.IntegerField(default=0)
Query for next id:
seq = Custom_seq.objects.filter(name='order sequence').select_for_update(no_wait=False).first()
new_order_id = seq.last_number + 1
seq.last_number = new_order_id
seq.save()
Why it works? Please note that at one time you are creating one order. It can be committed – so used, or rolled back – cancelled… both cases are supported.
0đź‘Ť
It is database internal behavior: https://www.postgresql.org/docs/current/functions-sequence.html
Important
To avoid blocking concurrent transactions that obtain numbers from the
same sequence, a nextval operation is never rolled back; that is, once
a value has been fetched it is considered used and will not be
returned again. This is true even if the surrounding transaction later
aborts, or if the calling query ends up not using the value. For
example an INSERT with an ON CONFLICT clause will compute the
to-be-inserted tuple, including doing any required nextval calls,
before detecting any conflict that would cause it to follow the ON
CONFLICT rule instead. Such cases will leave unused “holes” in the
sequence of assigned values. Thus, PostgreSQL sequence objects cannot
be used to obtain “gapless” sequences.