r written vector/matrix operations. allows not happy for() loops. nested for() loops take forever
i've read pretty for() loops can turned proper vector operations, life of me, can't figure out how in simple case:
i have 2 data tables, dt_a , dt_b of different lengths (dt_a: 1408 rows & dt_b: 2689 rows), columns dt_a$x, dt_b$y, , dt_b$z. want search matches of value in of column dt_a$x in each value of dt_b$y , if match, set dt_b$z <- dt_a$x. if there's no match, set "nomatch".
this programming 101 operation loops:
for (i in 1:2689) { (j in 1:1408) { if (grepl(dt_a$x[j], dt_b$y[i], ignore.case=true, perl=true)) { dt_b$z[i] <- dt_a$x[j]; break; } dt_a$z[i] <- "nomatch"; } } however, operation takes more 6 minutes run, iterating through loops. i'll need adapt larger data set, order of magnitude time increases not viable.
what's correct way nested for() loop operation using proper r vector operations?
thanks!
update
the answer @nickk vectorizes 1 of loops making nesting unecessary , reducing execution order of magnitude. i've credited useful answer because able work in code. answers provided @deanmacgregor useful in helping me understand more going on. couldn't them run in code, that's fault not understanding something. cross-join approach, in particular, best solution. need more practice in order make work data, don't want wait long before resolving question.
additional @romantsegelskyi teaching me proper question formatting, , @pierrelafortune , @brodieg teaching me importance , content of reproducible questions. ^_^
i've credited in source code (someday) released open source.
dt_b[, z := na] (x in dt_a$x) { found <- grepl(x, dt_b$y, ignore.case=true, perl=true) dt_b[found & is.na(z), z := x] } dt_b[is.na(z), z := "nomatch"] this closer functionality of original other answers far. dt_a$x can have valid pcre pattern rather looking exact matches. using @deanmacgregor's data, takes few seconds run on machine.
note takes advantage of fact grepl vectorised. working through dt_a$x , replacing na values replicates effect of break seen before.
for faster results, in place of grepl line.
found <- stringi::stri_detect_regex(dt_b$y, x, opts_regex = stri_opts_regex(case_insensitive = true))
Comments
Post a Comment