r - using lapply to create t-test table -


i want t-tests between 2 populations (in or out of treatment group (1 or 0 in sample data below, respectively)) across number of variables, , different studies, of sitting in same dataframe. in sample data below, want generate t-tests variables (in sample data: age, dollars, diseasecnt) between 1/0 treatment group. want run these t-tests, program, rather across population. have logic generate t-tests. however, need assistance final step of extracting appropriate parts function & creating digestable.

ultimately, want is: table of t-stats, p-values, variable t-test performed on, , program variable tested.

dt<-data.frame(                treated=sample(0:1,1000,replace=t)               ,program=c('program a','program b','program c','program d')               ,age=as.integer(rnorm(1000,mean=65,sd=15))               ,dollars=as.integer(rpois(1000,lambda=1000))               ,diseasecnt=as.integer(rnorm(1000,mean=5,sd=2)) )  progs<-unique(dt$program) # pull program names vars<-names(dt)[3:5] # pull variables run t tests  test<-lapply(progs, function(i)           tt<-lapply(vars, function(j) {t.test( dt[dt$treated==1 & dt$program == i,names(dt)==j]                                                  ,dt[dt$treated==0 & dt$program == i,names(dt)==j]                                                 ,alternative = 'two.sided'  )                list(j,tt$statistic,tt$p.value)  }                  ) )    # nested lapply produces results in list format can binded, complete output w/ both lapply's erroneous 

you should convert data.table first. (in code call original table df):

dt <- as.data.table(df) dt[, t.test(data=.sd, age ~ treated), by=program]    program  statistic parameter   p.value   conf.int estimate null.value alternative 1: program -0.6286875  247.8390 0.5301326 -4.8110579 65.26667          0   two.sided 2: program -0.6286875  247.8390 0.5301326  2.4828527 66.43077          0   two.sided 3: program b  1.4758524  230.5380 0.1413480 -0.9069634 67.15315          0   two.sided 4: program b  1.4758524  230.5380 0.1413480  6.3211834 64.44604          0   two.sided 5: program c  0.1994182  246.9302 0.8420998 -3.3560930 63.56557          0   two.sided 6: program c  0.1994182  246.9302 0.8420998  4.1122406 63.18750          0   two.sided 7: program d -1.1321569  246.0086 0.2586708 -6.1855837 62.31707          0   two.sided 8: program d -1.1321569  246.0086 0.2586708  1.6701237 64.57480          0   two.sided                 method      data.name 1: welch 2 sample t-test age treated 2: welch 2 sample t-test age treated 3: welch 2 sample t-test age treated 4: welch 2 sample t-test age treated 5: welch 2 sample t-test age treated 6: welch 2 sample t-test age treated 7: welch 2 sample t-test age treated 8: welch 2 sample t-test age treated 

in format, each program, statistic same both , equal t, parameter here df, conf.int, goes (in order) lower upper (so program a, confidence interval (-4.8110579, 2.4828527), , estimate group 0 , group 1 (so program a, mean treated == 0 65.26667, etc.

this quickest solution come with, , loop through vars, or perhaps there's simpler way.


edit: confirmed program a , age, using following code:

dt[program == 'program a', t.test(age ~ treated)]     welch 2 sample t-test  data:  age treated t = -0.62869, df = 247.84, p-value = 0.5301 alternative hypothesis: true difference in means not equal 0 95 percent confidence interval:  -4.811058  2.482853 sample estimates: mean in group 0 mean in group 1        65.26667        66.43077 

edit 2: here code loops through variables , rbind's them together:

do.call(rbind, lapply(vars, function(x) dt[, t.test(data=.sd, eval(parse(text=x)) ~ treated), by=program])) 

Comments