r - Read in table from text file with variable number of whitespaces as separator -


i read .txt file following structure:

id      chr   allele:effect ... q1        1     1:-0.133302   2: 0.007090 q2        1     1:-0.050089   2: 0.021212 q3        1     1: 0.045517   2:-0.038001 

the problem

  1. the field separator variable number of whitespaces, ,
  2. i need rid of leading numbers in second , third column.

finally, result should like:

qtl_id    chr   eff_1         eff_2 q1        1     -0.133302     0.007090 q2        1     -0.050089     0.021212 q3        1      0.045517     -0.038001 

edit

head(read.table(file = fpath, sep = "", header = true)) yields

id         chr allele.effect         ... q1  1 1:-0.133302            2:    0.007090 q2  1 1:-0.050089            2:    0.021212 q3  1          1:      0.045517 2:-0.038001 q4  1          1:      0.018582 2:-0.041846 q5  1 1:-0.146560            2:    0.005473 q6  1 1:-0.048240            2:    0.069418 

i applied brute force method. first lines read 'readlines' without separation. 'gsub' removes 'number:'-pieces, , 'strsplit' splits each line. intermediate result list 'a'. 'a[[1]]' vector of column names, 'a[[2]]' represents first row of data frame, , on. can piece data frame 'df' components of 'a'.

a <- lapply( readlines(filename),              function(x)              {                strsplit( gsub( pattern="[0-9]+[ ]*[:][ ]*",                                replacement="",                                x=as.character(x)                        ),                          "[ ]+")              } )  df <- data.frame() (n in 2:length(a)) { df <- rbind(df,t(unlist(a[[n]]))) } colnames(df) = unlist(a[[1]])  

unfortunately entries of 'df' still factors:

> df   id chr allele:effect       ... 1 q1   1     -0.133302  0.007090 2 q2   1     -0.050089  0.021212 3 q3   1      0.045517 -0.038001 > df$"allele:effect" [1] -0.133302 -0.050089 0.045517  levels: -0.133302 -0.050089 0.045517 

we can change follows:

for (m in 1:ncol(df)) {   n <- as.numeric(df[,m])   v <- suppresswarnings( as.numeric(levels(df[,m])[n]) )    if (!any(is.na(v))) { df[,m] <- v } }  

now numbers numbers.


Comments