r - Simplify replacing vector elements according to criteria using an apply function -


i've got dataset has variable called region represents different areas within australia. here 25 rows data:

> head(sample.2013$region, n = 25)  [1] qld major urban - capital city   vic rural                        nsw regional - low urbanisation   [4] sa regional - low urbanisation   nsw regional - low urbanisation  tas rural                         [7] act major urban - capital city   qld rural                        act major urban - capital city   [10] nt regional - low urbanisation   nsw other                        qld rural                        [13] act major urban - capital city   vic regional - high urbanisation tas rural                        [16] qld major urban - capital city   tas rural                        vic regional - high urbanisation [19] qld rural                        tas rural                        vic rural                        [22] qld other urban                  tas rural                        vic rural                        [25] act major urban - capital city 36 levels: act major urban - capital city nsw major urban - capital city nsw other urban ... ? 

naive solution

i need make variable called state based off variables within column. i'm using brute force method create new vector this:

add_states <- function(sample.2013) {     # add states region variable     sample.2013$state[grepl('nsw', sample.2013$region) == true] <- 'nsw'     sample.2013$state[grepl('vic', sample.2013$region) == true] <- 'vic'     sample.2013$state[grepl('qld', sample.2013$region) == true] <- 'qld'     sample.2013$state[grepl('wa', sample.2013$region) == true] <- 'wa'     sample.2013$state[grepl('sa', sample.2013$region) == true] <- 'sa'     sample.2013$state[grepl('tas', sample.2013$region) == true] <- 'tas'     sample.2013$state[grepl('tas', sample.2013$region) == true] <- 'tas'     sample.2013$state[grepl('act', sample.2013$region) == true] <- 'act'     sample.2013$state[grepl('nt', sample.2013$region) == true] <- 'nt'     return(sample.2013) } 

this works fine, it's difficult test , brittle. example know can pass ignore-case grepl, remove need 2 tasmanian cases.

for loops

i've been able replace above 'naive' approach loop , function this:

add_state <- function(input, output, state) {     # change variable y in place, prevents duplication     output <- replace(output, grepl(state, input, ignore.case = true), state)     output }  state_codes <- c('nsw', 'vic', 'qld', 'wa', 'sa', 'tas', 'act', 'nt') test_vector <- head(sample.2013$region, n = 500)  y = vector('character', length = length(test_vector))  (i in 1:length(state_codes)) {     y <- add_state(test_vector, y, state_codes[i]) }      table(y) y     act nsw  nt qld  sa tas vic  wa   14  99  50  42  49  98  92  45  11  

but quite verbose , loops not idiomatic r. haven't been able replace code apply function , replace values in vector, rather create bunch of other vectors.

lapply

this best i've managed using lapply:

add_state3 <- function(x, state) {     x <- replace(x, grepl(state, x, ignore.case = true), state)     x }  test_vector_short <- c("nsw 1", "nsw 2", "vic", "goo")  > output <- lapply(state_codes, add_state3, x = test_vector_short) > output [[1]] [1] "nsw" "nsw" "vic" "goo"  [[2]] [1] "nsw 1" "nsw 2" "vic"   "goo"    [[3]] [1] "nsw 1" "nsw 2" "vic"   "goo"    [[4]] [1] "nsw 1" "nsw 2" "vic"   "goo"    [[5]] [1] "nsw 1" "nsw 2" "vic"   "goo"    [[6]] [1] "nsw 1" "nsw 2" "vic"   "goo"    [[7]] [1] "nsw 1" "nsw 2" "vic"   "goo"    [[8]] [1] "nsw 1" "nsw 2" "vic"   "goo"   

the function works, takes each instance of state code , passes add_state3 function, creates list 8 elements, rather replacing elements in place.

question

sorry long preamble, question how use apply function change elements of vector in place according criteria?

you use gsub combine search , replace, e.g. gsub('^.*\\bnt\\b.*$', 'nt') replace matching nt strings (the \\b avoid things "pint" matching "nt").

if make regex '^.*\\b(nsw|nt|qld|...)\b.*', , replace \\1 (the captured match), can do:

state.regex <- sprintf('^.*\\b(%s)\\b.*$', paste(state_codes, collapse='|')) # "^.*\\b(nsw|vic|qld|wa|sa|tas|act|nt)\\b.*$" gsub(state.regex, '\\1', test_vector_short, ignore.case=t) # [1] "nsw" "nsw" "vic" "goo" 

this hinges on fact whenever find match want replace entire match, , matches (state codes) can condensed 1 regex.

otherwise, believe have loop have done (since need replacements, replace on updated vector).


Comments