sql - Adding second join condition increases query time exponentially -


so, i'm using redshift (which based on postgres). unfortunately, can't share data (for reasons obvious), it's more of conceptual question anyway. will, of course, share code.

this query returns virtually instantly:

select     count(*)     table_one c inner join     table_two z on     regexp_replace(c.telephone_number, '[^0-9]', '') = regexp_replace(z.affected_phone_number, '[^0-9]', ''); 

but 1 run hours:

select     count(*)     table_one c inner join     table_two z on     regexp_replace(c.telephone_number, '[^0-9]', '') = regexp_replace(z.affected_phone_number, '[^0-9]', '')     or c.email = z.requester_email; 

why adding second join condition or cause problem?

(i can around issue using union, i'm interested in learning here...)

ran explain if helps...

query plan problem query:

query plan xn aggregate  (cost=159728183882.77..159728183882.77 rows=1 width=0)   ->  xn nested loop ds_bcast_inner  (cost=0.00..159726036322.85 rows=859023969 width=0)         join filter: ((regexp_replace(("inner".telephone_number)::text, '[^0-9]'::text, ''::text, 1) = regexp_replace(("outer".affected_phone_number)::text, '[^0-9]'::text, ''::text, 1)) or (("inner".email)::text = ("outer".requester_email)::text))         ->  xn seq scan on table_two z  (cost=0.00..4447.40 rows=444740 width=36)         ->  xn seq scan on table_one c  (cost=0.00..3853.89 rows=385389 width=32) ----- nested loop join in query plan - review join predicates avoid cartesian products ----- 

query plan non-problem query:

query plan xn aggregate  (cost=62358556140.01..62358556140.01 rows=1 width=0)   ->  xn hash join ds_bcast_inner  (cost=4817.36..62356413666.21 rows=856989520 width=0)         hash cond: (regexp_replace(("outer".affected_phone_number)::text, '[^0-9]'::text, ''::text, 1) = regexp_replace(("inner".telephone_number)::text, '[^0-9]'::text, ''::text, 1))         ->  xn seq scan on table_two z  (cost=0.00..4447.40 rows=444740 width=12)         ->  xn hash  (cost=3853.89..3853.89 rows=385389 width=8)               ->  xn seq scan on table_one c  (cost=0.00..3853.89 rows=385389 width=8) 

we can guess why slow without access database.

guessing not appropriate tool performance optimization.

use explain statement see how postgres processes 2 statements.


Comments