i fitting glm using deseq2 package, , have situation individuals (ratids) nested within treatment (diet). author of package suggests individuals re-leveled 1:n within each diet (where n number of ratids within specific diet) rather original id/factor level (deseq2 vignette, page 35.)
the data looks (there more columns , rows, omitted simplicity):
diet extraction ratid 199 hamsp 8 65 74 hams 9 108 308 hams 18 100 41 hamsa 3 83 88 hamsp 12 11 221 hamsp 14 66 200 hamsa 8 57 155 hamsb 1 105 245 hamsb 19 50 254 hams 21 90 182 hamsb 4 4 283 hamsa 23 59 180 hamsp 4 22 71 hamsp 9 112 212 hams 12 63 220 hamsp 14 54 56 hams 7 81 274 hamsp 1 11 114 hams 17 102 143 hamsp 22 93 and here dput() output structure:
data = structure(list(diet = structure(c(4l, 1l, 1l, 2l, 4l, 4l, 2l, 3l, 3l, 1l, 3l, 2l, 4l, 4l, 1l, 4l, 1l, 4l, 1l, 4l), .label = c("hams", "hamsa", "hamsb", "hamsp", "lams"), class = "factor"), extraction = c(8l, 9l, 18l, 3l, 12l, 14l, 8l, 1l, 19l, 21l, 4l, 23l, 4l, 9l, 12l, 14l, 7l, 1l, 17l, 22l), ratid = structure(c(61l, 7l, 3l, 76l, 9l, 62l, 52l, 6l, 46l, 81l, 37l, 54l, 20l, 12l, 59l, 50l, 74l, 9l, 4l, 84l), .label = c("1", "10", "100", "102", "103", "105", "108", "109", "11", "110", "111", "112", "113", "13", "14", "16", "17", "18", "20", "22", "23", "24", "25", "26", "27", "28", "29", "3", "30", "31", "32", "34", "35", "36", "37", "39", "4", "40", "42", "43", "45", "46", "48", "49", "5", "50", "51", "52", "53", "54", "55", "57", "58", "59", "6", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "73", "77", "78", "79", "8", "80", "81", "82", "83", "85", "86", "88", "89", "90", "91", "92", "93", "94", "95", "96", "98", "99"), class = "factor")), .names = c("diet", "extraction", "ratid"), row.names = c(199l, 74l, 308l, 41l, 88l, 221l, 200l, 155l, 245l, 254l, 182l, 283l, 180l, 71l, 212l, 220l, 56l, 274l, 114l, 143l), class = "data.frame") can please specify elegant way generate new factor levels ratids within diet additional column of above data.frame. done roll function of data.table?
desired output (done manually):
diet extraction ratid newcol 1 hamsp 8 65 1 2 hams 9 108 1 3 hams 18 100 2 4 hamsa 3 83 1 5 hamsp 12 11 2 6 hamsp 14 66 3 7 hamsa 8 57 2 8 hamsb 1 105 1 9 hamsb 19 50 2 10 hams 21 90 3 11 hamsb 4 4 3 12 hamsa 23 59 3 13 hamsp 4 22 4 14 hamsp 9 112 5 15 hams 12 63 4 16 hamsp 14 54 6 17 hams 7 81 5 18 hamsp 1 11 2 19 hams 17 102 6 20 hamsp 22 93 7 note: there not equal number of rats in each treatment. i'd solution not re-order rows in data (if possible).
edit: there no 'natural' order ratids, long there 1:1 mapping within diet, fine.
you can convert 'ratid' 'factor' , coerce 'numeric'
library(data.table)#v1.9.4+ setdt(data)[, newcol:=as.numeric(factor(ratid, levels=unique(ratid))), diet] # diet extraction ratid newcol # 1: hamsp 8 65 1 # 2: hams 9 108 1 # 3: hams 18 100 2 # 4: hamsa 3 83 1 # 5: hamsp 12 11 2 # 6: hamsp 14 66 3 # 7: hamsa 8 57 2 # 8: hamsb 1 105 1 # 9: hamsb 19 50 2 #10: hams 21 90 3 #11: hamsb 4 4 3 #12: hamsa 23 59 3 #13: hamsp 4 22 4 #14: hamsp 9 112 5 #15: hams 12 63 4 #16: hamsp 14 54 6 #17: hams 7 81 5 #18: hamsp 1 11 2 #19: hams 17 102 6 #20: hamsp 22 93 7 or use match
setdt(data)[, newcol:=match(ratid, unique(ratid)), diet] or similar option base r
data$newcol <- with(data, ave(as.numeric(levels(ratid))[ratid], diet, fun=function(x) match(x, unique(x))))
Comments
Post a Comment