i want t-tests between 2 populations (in or out of treatment group (1 or 0 in sample data below, respectively)) across number of variables, , different studies, of sitting in same dataframe. in sample data below, want generate t-tests variables (in sample data: age, dollars, diseasecnt) between 1/0 treatment group. want run these t-tests, program, rather across population. have logic generate t-tests. however, need assistance final step of extracting appropriate parts function & creating digestable.
ultimately, want is: table of t-stats, p-values, variable t-test performed on, , program variable tested.
dt<-data.frame( treated=sample(0:1,1000,replace=t) ,program=c('program a','program b','program c','program d') ,age=as.integer(rnorm(1000,mean=65,sd=15)) ,dollars=as.integer(rpois(1000,lambda=1000)) ,diseasecnt=as.integer(rnorm(1000,mean=5,sd=2)) ) progs<-unique(dt$program) # pull program names vars<-names(dt)[3:5] # pull variables run t tests test<-lapply(progs, function(i) tt<-lapply(vars, function(j) {t.test( dt[dt$treated==1 & dt$program == i,names(dt)==j] ,dt[dt$treated==0 & dt$program == i,names(dt)==j] ,alternative = 'two.sided' ) list(j,tt$statistic,tt$p.value) } ) ) # nested lapply produces results in list format can binded, complete output w/ both lapply's erroneous
you should convert data.table first. (in code call original table df):
dt <- as.data.table(df) dt[, t.test(data=.sd, age ~ treated), by=program] program statistic parameter p.value conf.int estimate null.value alternative 1: program -0.6286875 247.8390 0.5301326 -4.8110579 65.26667 0 two.sided 2: program -0.6286875 247.8390 0.5301326 2.4828527 66.43077 0 two.sided 3: program b 1.4758524 230.5380 0.1413480 -0.9069634 67.15315 0 two.sided 4: program b 1.4758524 230.5380 0.1413480 6.3211834 64.44604 0 two.sided 5: program c 0.1994182 246.9302 0.8420998 -3.3560930 63.56557 0 two.sided 6: program c 0.1994182 246.9302 0.8420998 4.1122406 63.18750 0 two.sided 7: program d -1.1321569 246.0086 0.2586708 -6.1855837 62.31707 0 two.sided 8: program d -1.1321569 246.0086 0.2586708 1.6701237 64.57480 0 two.sided method data.name 1: welch 2 sample t-test age treated 2: welch 2 sample t-test age treated 3: welch 2 sample t-test age treated 4: welch 2 sample t-test age treated 5: welch 2 sample t-test age treated 6: welch 2 sample t-test age treated 7: welch 2 sample t-test age treated 8: welch 2 sample t-test age treated in format, each program, statistic same both , equal t, parameter here df, conf.int, goes (in order) lower upper (so program a, confidence interval (-4.8110579, 2.4828527), , estimate group 0 , group 1 (so program a, mean treated == 0 65.26667, etc.
this quickest solution come with, , loop through vars, or perhaps there's simpler way.
edit: confirmed program a , age, using following code:
dt[program == 'program a', t.test(age ~ treated)] welch 2 sample t-test data: age treated t = -0.62869, df = 247.84, p-value = 0.5301 alternative hypothesis: true difference in means not equal 0 95 percent confidence interval: -4.811058 2.482853 sample estimates: mean in group 0 mean in group 1 65.26667 66.43077 edit 2: here code loops through variables , rbind's them together:
do.call(rbind, lapply(vars, function(x) dt[, t.test(data=.sd, eval(parse(text=x)) ~ treated), by=program]))
Comments
Post a Comment