0π
Each item in the connections
object returns a thread-local connection to that database. By default, these connections cannot be shared between threads; attempting to do so will result in a DatabaseError
.
Always use connections[alias]
within the thread that executes your queries. Never access connections[alias]
in the parent thread and pass the object to the child thread. This will ensure that every connection object you use is local to the current thread, avoiding any threading issues.
To fix your code and make it thread-safe, you would change it like this:
from django.db import connections
import threading
class Transform(object):
def transform_data(self, listing):
# Access the database connection on the global `connections` object
# from within the child thread.
cursor = connections['legacy'].cursor()
cursor.execute('SELECT ... WHERE id = %s', listing.id)
data = cursor.fetchall()
...
def run(self):
for listing in listings:
threading.Thread(target=self.transform_data, args=[listing])
1π
Ideally each thread should be using its own connection. If you do that when you execute the select query inside transform_data you are essentially getting a snapshot of the data at that point in time. You can retrieve the rows without having to worry about their being updated or deleted by other threads provided that the other threads have their own connection.
If all threads share the same connection what exactly happens is very dependent on what database you are using and transaction isolation level
- Django: Template isn't loading app_tags file. Unable to use custom filter: "Invalid Filter"
- Django Paginator Limiting Iteration of all Objects
- Django: Disable python Requests Library Logging
- Django URL Pattern not triggered
- Raw Update Query inside a while loop in Django Python