i've got dataset has variable called region represents different areas within australia. here 25 rows data:
> head(sample.2013$region, n = 25) [1] qld major urban - capital city vic rural nsw regional - low urbanisation [4] sa regional - low urbanisation nsw regional - low urbanisation tas rural [7] act major urban - capital city qld rural act major urban - capital city [10] nt regional - low urbanisation nsw other qld rural [13] act major urban - capital city vic regional - high urbanisation tas rural [16] qld major urban - capital city tas rural vic regional - high urbanisation [19] qld rural tas rural vic rural [22] qld other urban tas rural vic rural [25] act major urban - capital city 36 levels: act major urban - capital city nsw major urban - capital city nsw other urban ... ? naive solution
i need make variable called state based off variables within column. i'm using brute force method create new vector this:
add_states <- function(sample.2013) { # add states region variable sample.2013$state[grepl('nsw', sample.2013$region) == true] <- 'nsw' sample.2013$state[grepl('vic', sample.2013$region) == true] <- 'vic' sample.2013$state[grepl('qld', sample.2013$region) == true] <- 'qld' sample.2013$state[grepl('wa', sample.2013$region) == true] <- 'wa' sample.2013$state[grepl('sa', sample.2013$region) == true] <- 'sa' sample.2013$state[grepl('tas', sample.2013$region) == true] <- 'tas' sample.2013$state[grepl('tas', sample.2013$region) == true] <- 'tas' sample.2013$state[grepl('act', sample.2013$region) == true] <- 'act' sample.2013$state[grepl('nt', sample.2013$region) == true] <- 'nt' return(sample.2013) } this works fine, it's difficult test , brittle. example know can pass ignore-case grepl, remove need 2 tasmanian cases.
for loops
i've been able replace above 'naive' approach loop , function this:
add_state <- function(input, output, state) { # change variable y in place, prevents duplication output <- replace(output, grepl(state, input, ignore.case = true), state) output } state_codes <- c('nsw', 'vic', 'qld', 'wa', 'sa', 'tas', 'act', 'nt') test_vector <- head(sample.2013$region, n = 500) y = vector('character', length = length(test_vector)) (i in 1:length(state_codes)) { y <- add_state(test_vector, y, state_codes[i]) } table(y) y act nsw nt qld sa tas vic wa 14 99 50 42 49 98 92 45 11 but quite verbose , loops not idiomatic r. haven't been able replace code apply function , replace values in vector, rather create bunch of other vectors.
lapply
this best i've managed using lapply:
add_state3 <- function(x, state) { x <- replace(x, grepl(state, x, ignore.case = true), state) x } test_vector_short <- c("nsw 1", "nsw 2", "vic", "goo") > output <- lapply(state_codes, add_state3, x = test_vector_short) > output [[1]] [1] "nsw" "nsw" "vic" "goo" [[2]] [1] "nsw 1" "nsw 2" "vic" "goo" [[3]] [1] "nsw 1" "nsw 2" "vic" "goo" [[4]] [1] "nsw 1" "nsw 2" "vic" "goo" [[5]] [1] "nsw 1" "nsw 2" "vic" "goo" [[6]] [1] "nsw 1" "nsw 2" "vic" "goo" [[7]] [1] "nsw 1" "nsw 2" "vic" "goo" [[8]] [1] "nsw 1" "nsw 2" "vic" "goo" the function works, takes each instance of state code , passes add_state3 function, creates list 8 elements, rather replacing elements in place.
question
sorry long preamble, question how use apply function change elements of vector in place according criteria?
you use gsub combine search , replace, e.g. gsub('^.*\\bnt\\b.*$', 'nt') replace matching nt strings (the \\b avoid things "pint" matching "nt").
if make regex '^.*\\b(nsw|nt|qld|...)\b.*', , replace \\1 (the captured match), can do:
state.regex <- sprintf('^.*\\b(%s)\\b.*$', paste(state_codes, collapse='|')) # "^.*\\b(nsw|vic|qld|wa|sa|tas|act|nt)\\b.*$" gsub(state.regex, '\\1', test_vector_short, ignore.case=t) # [1] "nsw" "nsw" "vic" "goo" this hinges on fact whenever find match want replace entire match, , matches (state codes) can condensed 1 regex.
otherwise, believe have loop have done (since need replacements, replace on updated vector).
Comments
Post a Comment