i have sql table 2 text columns , i'd find number of words in each of , number of common words.
basically word should regex [a-za-z]{3,} (at least 3 chars) , row values left="abc def ghi" , right="ghi abc qwe rtz" result should wordsleft=3, wordsright=4, wordscommon=2 (since share abc , ghi).
can sql only?
so, though sql might not best option (it's lot easier using scripting language) can using common table expressions connect by queries split strings rows , count number of distinct items in each column (and common items).
given table this:
| id | l | r | |----|--------------|-----------------| | 1 | abc def ghi | ghi abc qwe rtz | | 2 | bc de gh | gh ab qw rt xx | | 3 | xbc de gh yy | gh xbc | you can result:
| id | common_count | left_count | right_count | |----|--------------|------------|-------------| | 1 | 2 | 3 | 4 | | 2 | 1 | 3 | 5 | | 3 | 2 | 4 | 2 | using query (which i'm sure can optimized , improved - oracle isn't thing really):
with cte_left ( select distinct id, regexp_substr(l, '[^ ]+', 1, level) l t connect regexp_substr(l, '[^ ]+', 1, level) not null order id ), cte_right ( select distinct id, regexp_substr(r, '[^ ]+', 1, level) r t connect regexp_substr(r, '[^ ]+', 1, level) not null order id ), cte_all ( select cte_left.id, cte_left.l, cte_right.r cte_left join cte_right on cte_left.id = cte_right.id ) select id, count(distinct l) common_count, (select count(distinct l) cte_all id = t.id) left_count, (select count(distinct r) cte_all id = t.id) right_count cte_all t l in (select r cte_all) group t.id;
Comments
Post a Comment