doing pre-processing in data mining involve re-grouping , re-coding categorical variables. known once recode categorical variables in r (i.e. function mapvalues) need update categorical variable df$variable <- factor(df$variable) can view real number of levels in data.frame str(df).
i have written piece of code update automatically categorical variables of dataset:
cat <- sapply(df, is.factor) #select categorical variables names(df[ ,cat]) #view <- function(x) factor(x) #create function "apply" df[ ,cat] <- data.frame(apply(df[ ,cat],2, a)) #run apply function str(df) #check my question is: how select columns number of levels equal 1, once have updated dataset? have tried these lines without luck:
cat <- sapply(df, is.factor) #select categorical variables categorical <- df[,cat] #create df named "categorical" separating them <- function(x) nlevels(x)==1 #create "a" function apply x <- data.frame(apply(categorical,2, a)) #run apply function utils::view(x) #check , see not working... i appreciate , time
you can create logical index sapply , use filter out columns. reason
indx <- sapply(df[,cat], nlevels)==1 df[,cat][,indx, drop=false] or option filter
filter(function(x) nlevels(x)==1, df[,cat]) or
filter(negate(var), df[,cat]) regarding why apply didn't work,
apply(df[cat], 2, nlevels) # v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 # 0 0 0 0 0 0 0 0 0 0 the output 0 columns, not correct. upon further checking
apply(df[cat], 2, class) # v1 v2 v3 v4 v5 v6 #"character" "character" "character" "character" "character" "character" # v7 v8 v9 v10 #"character" "character" "character" "character" and correct class can found from
sapply(df[cat], class) # v1 v2 v3 v4 v5 v6 v7 v8 #"factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" # v9 v10 #"factor" "factor" the class of columns got changed 'factor' 'character' because output of apply matrix , matrix can hold single class. if there non-numeric column, convert whole matrix columns 'character' class. can use apply numeric matrix the return class 'numeric. in general, when there mixed class columns, better use lapply/vapply , logical vector or sapply useful.
data
set.seed(64) df <- as.data.frame(matrix(sample(letters[1:3], 3*10, replace=true), ncol=10)) df <- cbind(df, v11=1:3) cat <- sapply(df, is.factor)
Comments
Post a Comment