data.table - Releveling factor to facilitate use as nested factor in DESeq2 model in R -


i fitting glm using deseq2 package, , have situation individuals (ratids) nested within treatment (diet). author of package suggests individuals re-leveled 1:n within each diet (where n number of ratids within specific diet) rather original id/factor level (deseq2 vignette, page 35.)

the data looks (there more columns , rows, omitted simplicity):

     diet extraction ratid 199 hamsp          8    65 74   hams          9   108 308  hams         18   100 41  hamsa          3    83 88  hamsp         12    11 221 hamsp         14    66 200 hamsa          8    57 155 hamsb          1   105 245 hamsb         19    50 254  hams         21    90 182 hamsb          4     4 283 hamsa         23    59 180 hamsp          4    22 71  hamsp          9   112 212  hams         12    63 220 hamsp         14    54 56   hams          7    81 274 hamsp          1    11 114  hams         17   102 143 hamsp         22    93 

and here dput() output structure:

data = structure(list(diet = structure(c(4l, 1l, 1l, 2l, 4l, 4l, 2l,          3l, 3l, 1l, 3l, 2l, 4l, 4l, 1l, 4l, 1l, 4l, 1l, 4l), .label = c("hams",          "hamsa", "hamsb", "hamsp", "lams"), class = "factor"), extraction = c(8l,          9l, 18l, 3l, 12l, 14l, 8l, 1l, 19l, 21l, 4l, 23l, 4l, 9l, 12l,          14l, 7l, 1l, 17l, 22l), ratid = structure(c(61l, 7l, 3l, 76l,          9l, 62l, 52l, 6l, 46l, 81l, 37l, 54l, 20l, 12l, 59l, 50l, 74l,          9l, 4l, 84l), .label = c("1", "10", "100", "102", "103", "105",          "108", "109", "11", "110", "111", "112", "113", "13", "14", "16",          "17", "18", "20", "22", "23", "24", "25", "26", "27", "28", "29",          "3", "30", "31", "32", "34", "35", "36", "37", "39", "4", "40",          "42", "43", "45", "46", "48", "49", "5", "50", "51", "52", "53",          "54", "55", "57", "58", "59", "6", "60", "61", "62", "63", "64",          "65", "66", "67", "68", "69", "70", "71", "73", "77", "78", "79",          "8", "80", "81", "82", "83", "85", "86", "88", "89", "90", "91",          "92", "93", "94", "95", "96", "98", "99"), class = "factor")), .names = c("diet",          "extraction", "ratid"), row.names = c(199l, 74l, 308l, 41l, 88l,          221l, 200l, 155l, 245l, 254l, 182l, 283l, 180l, 71l, 212l, 220l,          56l, 274l, 114l, 143l), class = "data.frame") 

can please specify elegant way generate new factor levels ratids within diet additional column of above data.frame. done roll function of data.table?

desired output (done manually):

    diet extraction ratid newcol 1  hamsp          8    65      1 2   hams          9   108      1 3   hams         18   100      2 4  hamsa          3    83      1 5  hamsp         12    11      2 6  hamsp         14    66      3 7  hamsa          8    57      2 8  hamsb          1   105      1 9  hamsb         19    50      2 10  hams         21    90      3 11 hamsb          4     4      3 12 hamsa         23    59      3 13 hamsp          4    22      4 14 hamsp          9   112      5 15  hams         12    63      4 16 hamsp         14    54      6 17  hams          7    81      5 18 hamsp          1    11      2 19  hams         17   102      6 20 hamsp         22    93      7 

note: there not equal number of rats in each treatment. i'd solution not re-order rows in data (if possible).

edit: there no 'natural' order ratids, long there 1:1 mapping within diet, fine.

you can convert 'ratid' 'factor' , coerce 'numeric'

 library(data.table)#v1.9.4+  setdt(data)[, newcol:=as.numeric(factor(ratid,                         levels=unique(ratid))), diet]  #      diet extraction ratid newcol  # 1: hamsp          8    65      1  # 2:  hams          9   108      1  # 3:  hams         18   100      2  # 4: hamsa          3    83      1  # 5: hamsp         12    11      2  # 6: hamsp         14    66      3  # 7: hamsa          8    57      2  # 8: hamsb          1   105      1  # 9: hamsb         19    50      2  #10:  hams         21    90      3  #11: hamsb          4     4      3  #12: hamsa         23    59      3  #13: hamsp          4    22      4  #14: hamsp          9   112      5  #15:  hams         12    63      4  #16: hamsp         14    54      6  #17:  hams          7    81      5  #18: hamsp          1    11      2  #19:  hams         17   102      6  #20: hamsp         22    93      7 

or use match

 setdt(data)[, newcol:=match(ratid, unique(ratid)), diet] 

or similar option base r

data$newcol <- with(data, ave(as.numeric(levels(ratid))[ratid],        diet, fun=function(x) match(x, unique(x)))) 

Comments