oracle - Find number of common words in SQL table columns? -


i have sql table 2 text columns , i'd find number of words in each of , number of common words.

basically word should regex [a-za-z]{3,} (at least 3 chars) , row values left="abc def ghi" , right="ghi abc qwe rtz" result should wordsleft=3, wordsright=4, wordscommon=2 (since share abc , ghi).

can sql only?

so, though sql might not best option (it's lot easier using scripting language) can using common table expressions connect by queries split strings rows , count number of distinct items in each column (and common items).

given table this:

| id |            l |               r | |----|--------------|-----------------| |  1 |  abc def ghi | ghi abc qwe rtz | |  2 |     bc de gh |  gh ab qw rt xx | |  3 | xbc de gh yy |          gh xbc | 

you can result:

| id | common_count | left_count | right_count | |----|--------------|------------|-------------| |  1 |            2 |          3 |           4 | |  2 |            1 |          3 |           5 | |  3 |            2 |          4 |           2 | 

using query (which i'm sure can optimized , improved - oracle isn't thing really):

with  cte_left (   select distinct id, regexp_substr(l, '[^ ]+', 1, level) l   t   connect regexp_substr(l, '[^ ]+', 1, level) not null   order id ), cte_right (   select distinct id, regexp_substr(r, '[^ ]+', 1, level) r   t   connect regexp_substr(r, '[^ ]+', 1, level) not null   order id ), cte_all (   select cte_left.id, cte_left.l, cte_right.r    cte_left   join cte_right on cte_left.id = cte_right.id )  select id, count(distinct l) common_count,     (select count(distinct l) cte_all id = t.id) left_count,     (select count(distinct r) cte_all id = t.id) right_count cte_all t l in (select r cte_all) group t.id; 

sample sql fiddle


Comments