[Django]-Django ORM Cross Product

2đź‘Ť

âś…

If you can afford to make one database trip for each issue type, try something like this untested snippet:

def lacking_configs():
    for issue_type in IssueType.objects.all():
        for customer in Customer.objects.filter(
                issuetypeconfigpercustomer__issue_type=None
            ):
            yield customer, issue_type

missing = list(lacking_configs())

This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).

[update]

I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.

In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).

The closest thing to a cross product would be using the “extra” method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:

missing = Customer.objects.extra(
    select={"issue_type_id": 'appname_issuetype.id'},
    tables=['appname_issuetype']
)

As a result, each Customer object will have an extra attribute, “issue_type_id”, containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want – this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.

In order to assemble a list of (Customer, IssueType) you need something like:

cross_product = [
    (customer, IssueType.objects.get(pk=customer.issue_type_id))
    for customer in 
    Customer.objects.extra(
        select={"issue_type_id": 'appname_issuetype.id'},
        tables=['appname_issuetype'],
        where=["""
           NOT EXISTS (
               SELECT 1 
               FROM appname_issuetypeconfigpercustomer 
               WHERE issuetype_id=appname_issuetype.id 
                AND customer_id=appname_customer.id
           )
        """]
    )
]

Not only this requires the same number of trips to the database as the “generator” based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:

missing = Customer.objects.extra(
    select={"issue_type_id": 'appname_issuetype.id'},
    tables=['appname_issuetype'],
    where=["""
       NOT EXISTS (
           SELECT 1 
           FROM appname_issuetypeconfigpercustomer 
           WHERE issuetype_id=appname_issuetype.id 
             AND customer_id=appname_customer.id
       )
    """]
)
issue_list = dict(
    (issue.id, issue)
    for issue in 
    IssueType.objects.filter(
        pk__in=set(m.issue_type_id for m in missing)
    )
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]

Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.

This is all untested, sorry if some homework was left for you.

Leave a comment