so, i'm using redshift (which based on postgres). unfortunately, can't share data (for reasons obvious), it's more of conceptual question anyway. will, of course, share code.
this query returns virtually instantly:
select count(*) table_one c inner join table_two z on regexp_replace(c.telephone_number, '[^0-9]', '') = regexp_replace(z.affected_phone_number, '[^0-9]', ''); but 1 run hours:
select count(*) table_one c inner join table_two z on regexp_replace(c.telephone_number, '[^0-9]', '') = regexp_replace(z.affected_phone_number, '[^0-9]', '') or c.email = z.requester_email; why adding second join condition or cause problem?
(i can around issue using union, i'm interested in learning here...)
ran explain if helps...
query plan problem query:
query plan xn aggregate (cost=159728183882.77..159728183882.77 rows=1 width=0) -> xn nested loop ds_bcast_inner (cost=0.00..159726036322.85 rows=859023969 width=0) join filter: ((regexp_replace(("inner".telephone_number)::text, '[^0-9]'::text, ''::text, 1) = regexp_replace(("outer".affected_phone_number)::text, '[^0-9]'::text, ''::text, 1)) or (("inner".email)::text = ("outer".requester_email)::text)) -> xn seq scan on table_two z (cost=0.00..4447.40 rows=444740 width=36) -> xn seq scan on table_one c (cost=0.00..3853.89 rows=385389 width=32) ----- nested loop join in query plan - review join predicates avoid cartesian products ----- query plan non-problem query:
query plan xn aggregate (cost=62358556140.01..62358556140.01 rows=1 width=0) -> xn hash join ds_bcast_inner (cost=4817.36..62356413666.21 rows=856989520 width=0) hash cond: (regexp_replace(("outer".affected_phone_number)::text, '[^0-9]'::text, ''::text, 1) = regexp_replace(("inner".telephone_number)::text, '[^0-9]'::text, ''::text, 1)) -> xn seq scan on table_two z (cost=0.00..4447.40 rows=444740 width=12) -> xn hash (cost=3853.89..3853.89 rows=385389 width=8) -> xn seq scan on table_one c (cost=0.00..3853.89 rows=385389 width=8)
we can guess why slow without access database.
guessing not appropriate tool performance optimization.
use explain statement see how postgres processes 2 statements.
Comments
Post a Comment