Indicator feature creation in R based on multiple columns -


i have dataset 10 columns , out of them 10, 3 of interest create new indicator feature. features "pt", "pn", & "m" , take different values. off values these 3 features take, there toal of 9 unique combinations needs captures in new variable.

   pathot pathon pathom 1       pt2    pn1     m0 4       pt1    pn1     m0 13      pt3    pn1     m0 161     pt1   *pn2     m0 391     pt1    pn1    *m1 810   *ptis    pn1     m0 948     pt3   *pn2     m0 1043    pt2    pn1    *m1 1067   *pt4    pn1     m0 

for example, new variable have value "1" when pathot=pt2, pathon=pn1 & pathom=m0 , on upto value 9. have completed task after spending 20 lines of code involving vectorised operation unique combinations.

diag3_bs$sfd[diag3_bs$pathot=="pt2" & diag3_bs$pathon=="pn1" &                 diag3_bs$pathom=="m0"] <- 1 diag3_bs$sfd[diag3_bs$pathot=="pt1" & diag3_bs$pathon=="pn1" &                 diag3_bs$pathom=="m0"] <- 2 diag3_bs$sfd[diag3_bs$pathot=="pt3" & diag3_bs$pathon=="pn1" &                 diag3_bs$pathom=="m0"] <- 3... on upto 9. 

i want ask if there better more automated way of getting same result?

dput(data.frame) given below

 structure(list(f_status = structure(c(1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l,  1l, 1l), .label = "y", class = "factor"), event_id = structure(c(1l,   1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "baseline", class =  "factor"),       pag_name = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,       1l), .label = "br2", class = "factor"), ptsize = c(3, 4,       2.7, 2, 0.9, 3, 3, 0.9, 3, 4.5), ptsize_u = structure(c(1l,       1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "cm", class = "factor"),       pt_sym = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,       1l), .label = c("", "-", "<", ">"), class = "factor"), pathot = structure(c(4l,       4l, 4l, 3l, 3l, 4l, 4l, 3l, 4l, 4l), .label = c("*pt4", "*ptis",       "pt1", "pt2", "pt3"), class = "factor"), pathon = structure(c(2l,       2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l), .label = c("*pn2", "pn1"      ), class = "factor"), pathom = structure(c(2l, 2l, 2l, 2l,       2l, 2l, 2l, 2l, 2l, 2l), .label = c("*m1", "m0"), class = "factor"),       rsubjid = 901000:901009, rusubjid = structure(1:10, .label = c(      "000301-000-901-251", "000301-000-901-252", "000301-000-901-253",       "000301-000-901-254", "000301-000-901-255", "000301-000-901-256",       "000301-000-901-257", "000301-000-901-258", "000301-000-901-259",       "000301-000-901-260", "000301-000-901-261", "000301-000-901-262") , class = "factor")), .names = c("f_status",  "event_id", "pag_name", "ptsize", "ptsize_u", "pt_sym", "pathot",   "pathon", "pathom", "rsubjid", "rusubjid"), row.names = c(na,  10l),  class = "data.frame") 

thanks.

i tried edit data didn't throw error on input. created version of tabulation of possible combinations:

stg_tbl <- structure(list(pathot = structure(c(4l, 3l, 5l, 3l, 3l, 2l, 5l,  4l, 1l), .label = c("*pt4", "*ptis", "pt1", "pt2", "pt3"), class = "factor"),      pathon = structure(c(2l, 2l, 2l, 1l, 2l, 2l, 1l, 2l, 2l), .label = c("*pn2",      "pn1"), class = "factor"), pathom = structure(c(2l, 2l, 2l,      2l, 1l, 2l, 2l, 1l, 2l), .label = c("*m1", "m0"), class = "factor")), .names = c("pathot",  "pathon", "pathom"), class = "data.frame", row.names = c("1",  "4", "13", "161", "391", "810", "948", "1043", "1067")) 

make vector of text-equivalents of categories:

stg_lbls <- with(stg_tbl, paste(pathot, pathon, pathom, sep="_") ) 

then as.numeric values of factor created using levels desired result:

dat$stg <- with(dat, factor( paste(pathot, pathon, pathom, sep="_"), levels=stg_lbls)) as.numeric(dat$stg) #[1] 1 1 1 2 2 1 1 2 1 1 

you can assign values in usual way:

dat$sfd <- as.numeric(dat$stg) 

Comments