suppose have data like
fruits <- data.table(fruitid=c(1,2,3), fruit=c("apple", "banana", "strawberry")) colors <- data.table(colorid=c(1,2,3,4,5), fruitid=c(1,1,1,2,3), color=c("red","yellow","green","yellow","red")) tastes <- data.table(tasteid=c(1,2,3), fruitid=c(1,1,3), taste=c("sweeet", "sour", "sweet")) setkey(fruits, "fruitid") setkey(colors, "colorid") setkey(tastes, "tasteid") fruits fruitid fruit 1: 1 apple 2: 2 banana 3: 3 strawberry colors colorid fruitid color 1: 1 1 red 2: 2 1 yellow 3: 3 1 green 4: 4 2 yellow 5: 5 3 red tastes tasteid fruitid taste 1: 1 1 sweeet 2: 2 1 sour 3: 3 3 sweet i typically need perform left-outer joins on data this. instance, "give me fruits , colors" requires me write (and maybe there's better way?)
setkey(colors, "fruitid") result <- colors[fruits, allow.cartesian=true] setkey(colors, "colorid") three lines of code such simple , frequent task seemed excessive, wrote method myleftjoin
myleftjoin <- function(tbl1, tbl2){ # performs left join using key in tbl1 (i.e. keeps rows tbl1 , matching rows tbl2) oldkey <- key(tbl2) setkeyv(tbl2, key(tbl1)) result <- tbl2[tbl1, allow.cartesian=true] setkeyv(tbl2, oldkey) return(result) } which can use
myleftjoin(fruits, colors) colorid fruitid color fruit 1: 1 1 red apple 2: 2 1 yellow apple 3: 3 1 green apple 4: 4 2 yellow banana 5: 5 3 red strawberry how can extend method can pass number of tables , chained left outer join of of them? myleftjoin(tbl1, ...)
for instance, i'd result of myleftjoin(fruits, colors, tastes) equivalent
setkey(colors, "fruitid") setkey(tastes, "fruitid") result <- tastes[colors[fruits, allow.cartesian=true], allow.cartesian=true] setkey(tastes, "tasteid") setkey(colors, "colorid") result tasteid fruitid taste colorid color fruit 1: 1 1 sweeet 1 red apple 2: 2 1 sour 1 red apple 3: 1 1 sweeet 2 yellow apple 4: 2 1 sour 2 yellow apple 5: 1 1 sweeet 3 green apple 6: 2 1 sour 3 green apple 7: na 2 na 4 yellow banana 8: 3 3 sweet 5 red strawberry perhaps there's elegant solution using methods in data.table package missed? thanks
(edit: fixed mistake in data)
i committed new feature in data.table, v1.9.5, can join without setting keys (that is, specify columns join directly, without having use setkey() first):
with that, simply:
require(data.table) # v1.9.5+ fruits[tastes, on="fruitid"][colors, on="fruitid"] # no setkey required # fruitid fruit tasteid taste colorid color # 1: 1 apple 1 sweeet 1 red # 2: 1 apple 2 sour 1 red # 3: 1 apple 1 sweeet 2 yellow # 4: 1 apple 2 sour 2 yellow # 5: 1 apple 1 sweeet 3 green # 6: 1 apple 2 sour 3 green # 7: 2 na na na 4 yellow # 8: 3 strawberry 3 sweet 5 red
Comments
Post a Comment