r - Select categorical variables where number of levels is equal to 1 -


doing pre-processing in data mining involve re-grouping , re-coding categorical variables. known once recode categorical variables in r (i.e. function mapvalues) need update categorical variable df$variable <- factor(df$variable) can view real number of levels in data.frame str(df).

i have written piece of code update automatically categorical variables of dataset:

cat <- sapply(df, is.factor) #select categorical variables names(df[ ,cat]) #view <- function(x) factor(x) #create function "apply" df[ ,cat] <- data.frame(apply(df[ ,cat],2, a)) #run apply function str(df) #check 

my question is: how select columns number of levels equal 1, once have updated dataset? have tried these lines without luck:

cat <- sapply(df, is.factor) #select categorical variables categorical <- df[,cat] #create df named "categorical" separating them <- function(x) nlevels(x)==1 #create "a" function apply x <- data.frame(apply(categorical,2, a)) #run apply function utils::view(x) #check , see not working... 

i appreciate , time

you can create logical index sapply , use filter out columns. reason

  indx <- sapply(df[,cat], nlevels)==1   df[,cat][,indx, drop=false] 

or option filter

 filter(function(x) nlevels(x)==1, df[,cat]) 

or

 filter(negate(var), df[,cat]) 

regarding why apply didn't work,

 apply(df[cat], 2, nlevels)  # v1  v2  v3  v4  v5  v6  v7  v8  v9 v10   # 0   0   0   0   0   0   0   0   0   0  

the output 0 columns, not correct. upon further checking

 apply(df[cat], 2, class)  #       v1          v2          v3          v4          v5          v6   #"character" "character" "character" "character" "character" "character"   #       v7          v8          v9         v10   #"character" "character" "character" "character"  

and correct class can found from

 sapply(df[cat], class)  #    v1       v2       v3       v4       v5       v6       v7       v8   #"factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"   #    v9      v10   #"factor" "factor"  

the class of columns got changed 'factor' 'character' because output of apply matrix , matrix can hold single class. if there non-numeric column, convert whole matrix columns 'character' class. can use apply numeric matrix the return class 'numeric. in general, when there mixed class columns, better use lapply/vapply , logical vector or sapply useful.

data

set.seed(64) df <- as.data.frame(matrix(sample(letters[1:3], 3*10, replace=true), ncol=10))  df <- cbind(df, v11=1:3) cat <- sapply(df, is.factor)  

Comments