dplyr - read_csv from R readr treats data differently than generated data -


i had issue using ifelse() in function solved in stackoverflow thread. after implementing suggestions code performed desired. code below

country_panel <- function(x, y) {   ifelse(cnames$time < y,      cnames[match(x, cnames$country),]$panel,     cnames[match(x, cnames$country),]$standardize  )  } 

generate fake data this

 countryname <- c("viet nam", "viet nam", "viet nam", "viet nam", "viet nam") year <- c(1974, 1975, 1976, 1977,1978)  df <- data.frame(countryname, year, stringsasfactors=false)  country <- c("vietnam, north", "vietnam, n.", "vietnam north", "viet nam",   "democratic republic of vietnam") standardize <- c("vietnam, democratic republic of", "vietnam, democratic republic of", "vietnam, democratic republic of", "vietnam, democratic republic of", "vietnam, democratic republic of") panel <- c("vietnam", "vietnam","vietnam","vietnam","vietnam") time <- c(1976,1976,1976,1976,1976)  cnames <- data.frame(country, standardize, panel, time, stringsasfactors = false) 

evaluate using function using

 d1 <- df %>%     mutate(new_name = country_panel(countryname, year)) 

however, when went implement suggestions real data problem returned function not evaluate condition in ifelse statement , returns $panel value.

because using stringsasfactors = false in data.frame worked fake data thought using read.csv(path, stringsasfactors = false) work instead of using read_csv both perform equally.

i should note checked attributes of each vector in data frame using str() , forced them match found in fake data.

the real data , scripts replicate can found on github here

here dput(head(cnames))

structure(list(country = c("afghanistan", "afghanistan", "albania",  "albania", "albania", "algeria"), standardize = c("afghanistan",  "afghanistan", "albania", "albania", "albania", "algeria"), time = c(2015l,  2015l, 2015l, 2015l, 2015l, 2015l), panel = c("afghanistan",  "afghanistan", "albania", "albania", "albania", "algeria")), .names =      c("country",  "standardize", "time", "panel"), class = c("tbl_df", "data.frame"  ), row.names = c(na, -6l)) 

and dput(head(d1))

structure(list(countryname = c("afghanistan", "afghanistan",  "afghanistan", "afghanistan", "afghanistan", "afghanistan"),       year = 1970:1975), .names = c("countryname", "year"), class =    c("tbl_df",   "data.frame"), row.names = c(na, -6l)) 

d1 <- df %>%    mutate(new_name = country_panel(countryname, year)) df2 <- structure(list(country = c("afghanistan", "afghanistan", "albania",   "albania", "albania", "algeria"), standardize = c("afghanistan",    "afghanistan", "albania", "albania", "albania", "algeria"), time = c(2015l,    2015l, 2015l, 2015l, 2015l, 2015l), panel = c("afghanistan",    "afghanistan", "albania", "albania", "albania", "algeria")), .names =      c("country",    "standardize", "time", "panel"), class = c("tbl_df", "data.frame"   ), row.names = c(na, -6l))  d2 <- df2 %>%    mutate(new_name = country_panel(countryname, year)) 

this gives:

error: wrong result size (5), expected 6 or 1 

the immediate problem mutate expects country_panel return 6 values since df2 has 6 rows (dim(df2)), or, alternatively, 1 value recycle needed. first example made data in fact works because number of rows happen match.

try running example again after running:

debug(country_panel) ... # after done: undebug(country_panel) 

this give line line view of function called, , can check out objects exist or created within function runs (exit anytime q).

instead of using ifelse might better use sequential matching, first country , time. or try making data frame out of x , y vectors passed function, merging cnames, , picking name want conditions within data frame.


Comments