i have created function in r detect , replace outliers quartile method in r. function working extremely slow large dataset. dataset 1,000,000 rows , 250 columns. numeric.
replaceoutliers3 <- function(x, strength,replacewithna,na.rm = true, ...) { totalcols<-ncol(x) totalrows<- nrow(x) (col in 1:totalcols) { cat("starting col ...",col) cat("\n") quantiles <- quantile( x[,col], probs=c(.25, .75 ), na.rm= na.rm, ...) if(strength ==1 ) { h<-1.5 * iqr(x[,col], na.rm = na.rm) } else if(strength ==2) { h<-3 * iqr(x[,col], na.rm = na.rm) } else { stop("please provide correct strength : 1 mild , 2 strong") } lowerthresh<-quantiles[1] - h upperthresh<-quantiles[2] + h if(replacewithna) { (row in 1:totalrows) { if(x[row, col] < lowerthresh) { x[row, col] <- na } if(x[row, col] > upperthresh) { x[row, col] <- na } } } else { (row in 1:totalrows) { cat("starting row ...",row) cat("\n") if(x[row, col] < lowerthresh) { x[row, col] <- lowerthresh } if(x[row, col] > upperthresh) { x[row, col] <- upperthresh } } } cat("column completed :",col) } return(x) } i know code crude. can 1 please me code faster version apply , dplyr etc
Comments
Post a Comment