r - Data.table - left outer join on multiple tables -


suppose have data like

fruits <- data.table(fruitid=c(1,2,3), fruit=c("apple", "banana", "strawberry")) colors <- data.table(colorid=c(1,2,3,4,5), fruitid=c(1,1,1,2,3), color=c("red","yellow","green","yellow","red")) tastes <- data.table(tasteid=c(1,2,3), fruitid=c(1,1,3), taste=c("sweeet", "sour", "sweet"))  setkey(fruits, "fruitid") setkey(colors, "colorid") setkey(tastes, "tasteid")  fruits    fruitid      fruit 1:       1      apple 2:       2     banana 3:       3 strawberry  colors    colorid fruitid  color 1:       1       1    red 2:       2       1 yellow 3:       3       1  green 4:       4       2 yellow 5:       5       3    red  tastes    tasteid fruitid  taste 1:       1       1 sweeet 2:       2       1   sour 3:       3       3  sweet 

i typically need perform left-outer joins on data this. instance, "give me fruits , colors" requires me write (and maybe there's better way?)

setkey(colors, "fruitid") result <- colors[fruits, allow.cartesian=true] setkey(colors, "colorid") 

three lines of code such simple , frequent task seemed excessive, wrote method myleftjoin

myleftjoin <- function(tbl1, tbl2){   # performs left join using key in tbl1 (i.e. keeps rows tbl1 , matching rows tbl2)    oldkey <- key(tbl2)   setkeyv(tbl2, key(tbl1))   result <- tbl2[tbl1, allow.cartesian=true]   setkeyv(tbl2, oldkey)   return(result) } 

which can use

myleftjoin(fruits, colors)    colorid fruitid  color      fruit 1:       1       1    red      apple 2:       2       1 yellow      apple 3:       3       1  green      apple 4:       4       2 yellow     banana 5:       5       3    red strawberry 

how can extend method can pass number of tables , chained left outer join of of them? myleftjoin(tbl1, ...)

for instance, i'd result of myleftjoin(fruits, colors, tastes) equivalent

setkey(colors, "fruitid") setkey(tastes, "fruitid") result <- tastes[colors[fruits, allow.cartesian=true], allow.cartesian=true] setkey(tastes, "tasteid") setkey(colors, "colorid")  result    tasteid fruitid  taste colorid  color      fruit 1:       1       1 sweeet       1    red      apple 2:       2       1   sour       1    red      apple 3:       1       1 sweeet       2 yellow      apple 4:       2       1   sour       2 yellow      apple 5:       1       1 sweeet       3  green      apple 6:       2       1   sour       3  green      apple 7:      na       2     na       4 yellow     banana 8:       3       3  sweet       5    red strawberry 

perhaps there's elegant solution using methods in data.table package missed? thanks

(edit: fixed mistake in data)

i committed new feature in data.table, v1.9.5, can join without setting keys (that is, specify columns join directly, without having use setkey() first):

with that, simply:

require(data.table) # v1.9.5+ fruits[tastes, on="fruitid"][colors, on="fruitid"] # no setkey required #    fruitid      fruit tasteid  taste colorid  color # 1:       1      apple       1 sweeet       1    red # 2:       1      apple       2   sour       1    red # 3:       1      apple       1 sweeet       2 yellow # 4:       1      apple       2   sour       2 yellow # 5:       1      apple       1 sweeet       3  green # 6:       1      apple       2   sour       3  green # 7:       2         na      na     na       4 yellow # 8:       3 strawberry       3  sweet       5    red 

Comments