time series - Lags in R within specific subsets? -


suppose have following dataframe:

df <- data.frame("yearmonth"=c("2005-01","2005-02","2005-03","2005-01","2005-02","2005-03"),"state"=c(1,1,1,2,2,2),"county"=c(3,3,3,3,3,3),"unemp"=c(4.0,3.6,1.4,3.7,6.5,5.4)) 

i'm trying create lag unemployment within each unique state-county combination. want end this:

df2 <- data.frame("yearmonth"=c("2005-01","2005-02","2005-03","2005-01","2005-02","2005-03"),"state"=c(1,1,1,2,2,2),"county"=c(3,3,3,3,3,3),"unemp"=c(4.0,3.6,1.4,3.7,6.5,5.4),"unemp_lag"=c(na,4.0,3.6,na,3.7,6.5)) 

now, imagine situation except thousands of different county-state combinations , on several years. tried using lag function, zoo.lag function, couldn't make take account state-county codes. 1 possibility make giant loop, think data (r not handle loops well) , looking cleaner way it. ideas? thanks!

with data.table:

library(data.table) setdt(df)[,`:=`(unemp_lag1=shift(unemp,n=1l,fill=na, type="lag")),by=.(state, county)][]     yearmonth state county unemp unemp_lag1 1:   2005-01     1      3   4.0         na 2:   2005-02     1      3   3.6        4.0 3:   2005-03     1      3   1.4        3.6 4:   2005-01     2      3   3.7         na 5:   2005-02     2      3   6.5        3.7 6:   2005-03     2      3   5.4        6.5 

Comments