[Answered ]-Is there any way to make this postgres query faster?

1👍

Example table foo has a text column x which contains numbers.

explain analyze select * from foo where x =any('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99}'::TEXT[]);

 Gather  (cost=1000.00..62144.07 rows=99 width=10) (actual time=0.682..364.252 rows=99 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on foo  (cost=0.00..61134.17 rows=41 width=10) (actual time=237.705..358.203 rows=33 loops=3)
         Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99}'::text[]))
         Rows Removed by Filter: 333300
 Planning Time: 0.219 ms
 Execution Time: 364.292 ms

Note =ANY operator uses linear search in the array, which is slow if the array is long.

explain analyze select * from (select * from unnest('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99}'::TEXT[]) x 
where x not in ('7','8')) a join foo using (x);
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1002.45..12146.72 rows=97 width=36) (actual time=0.904..44.894 rows=97 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Hash Join  (cost=2.45..11137.02 rows=40 width=36) (actual time=24.904..38.844 rows=32 loops=3)
         Hash Cond: (foo.x = x.x)
         ->  Parallel Seq Scan on foo  (cost=0.00..9571.67 rows=416667 width=10) (actual time=0.007..16.592 rows=333333 loops=3)
         ->  Hash  (cost=1.24..1.24 rows=97 width=32) (actual time=0.057..0.057 rows=97 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Function Scan on unnest x  (cost=0.00..1.24 rows=97 width=32) (actual time=0.020..0.040 rows=97 loops=3)
                     Filter: (x <> ALL ('{7,8}'::text[]))
                     Rows Removed by Filter: 2

Using unnest produces a much faster hash join that avoids the linear search through the array. Additionally, I’ve added a "NOT IN" clause with the "Negative list to exclude blacklisted categories for a specific user" inside the subquery that generates the list of values to get, so the blacklisted values are removed from the list once at the beginning.

So, it generates the list of values you want, removes the blacklisted ones, hashes it, and scans the table looking up each row’s value for this column in the hash.

This should make the query a bit faster, but it won’t solve the main problem which is that it is reading 3GB of data.

When I add an index with btree(cat_level_1, cat_level_2, cat_level_3, cat_level_4, cat_level_5) it doesn’t make much of a difference by the way.

This is normal, you’re not using cat_levels_1 to 4 in the query, so it can’t use cat_level_5 which is at the end of the index column list. Perhaps an index on cat_level_5 would help, but I doubt it as this query is selecting a large fraction of the table.

Leave a comment