[Fixed]-Why is my query suboptimal?

The problem with your query are your inequalities. Alas, these limit the use of indexes — at most one inequality per index lookup.

The data structure that you need to solve this is a multi-dimensions index. In SQL databases, this is usually supplied using GIS extensions, which are documented here.

Without these extensions, you can try arcane cleverness. I can think of one way around this problem, but it makes both the table and the query a bit more complicated. Add a new column for east and north, which is an integer: easti and northi. Then, build an index on easti, northi. And, write the query as:

select *
from ((select sr.*
       from soil_soilregion sr
       where easti = -86 and northi in (40, 41)
      ) union all
      (select sr.*
       from soil_soilregion sr
       where easti = -85 and northi in (40, 41)
      ) 
     ) sr
where east > -86.8379775155 AND north > 40.3782334957 AND
      south < 40.3817576747 AND west < -86.8240119179;

The subquery will get everything in a relatively small box. This is then filtered by the outer query. The subquery should use the indexes, so it should be quite fast.

Given the size of what you are looking for, using a fraction of a degree would work even better than a whole degree for the integer conversion.

Gordon Linoff

Profile pages design based on login in Django

A short-term, but partial fix is to have a “covering index”. That is, make an index that has the bounding box, plus the id (and maybe the soil type?). then do this:

SELECT b.*
    FROM (
        SELECT id FROM soilregion
            WHERE east... AND west ... AND ...
         ) AS a
    JOIN soilregion AS b ON b.id = a.id;

This is likely to speed up the query because of:

The index is all that is needed in the subquery
The index is smaller than the data
When the subquery is finished, it has a short list of ids, which are easily and quickly looked up in the real table (via the JOIN).

Some of your ‘why’ questions:

The individual indexes merely eliminate some fraction of the 7M rows (as in “everything east of here”). That does not help much. Furthermore, when an index is that ‘useless’, it is not used — it is faster to simply scan the table.
The compound index (north-south…) does not do any better. This is because it starts with a range test on north and can’t get past that.
The second attempt ‘seemed’ to be faster — this could be because of caching, not because of it being any better.

Solutions?…

Plan A: Spatial index as Gordon mentioned.

Plan B: Restructure the data to work with a pseudo-2D indexing method described in my “find the nearest pizza parlors” blog. A problem: I have not thought through how to adapt for “overlapping” instead of “nearest”.

Rick James

Using nested Django transaction.atomic() to handle exceptions?

Source:stackexchange.com

Leave a comment Cancel reply