i have dataset 10 columns , out of them 10, 3 of interest create new indicator feature. features "pt", "pn", & "m" , take different values. off values these 3 features take, there toal of 9 unique combinations needs captures in new variable.
pathot pathon pathom 1 pt2 pn1 m0 4 pt1 pn1 m0 13 pt3 pn1 m0 161 pt1 *pn2 m0 391 pt1 pn1 *m1 810 *ptis pn1 m0 948 pt3 *pn2 m0 1043 pt2 pn1 *m1 1067 *pt4 pn1 m0 for example, new variable have value "1" when pathot=pt2, pathon=pn1 & pathom=m0 , on upto value 9. have completed task after spending 20 lines of code involving vectorised operation unique combinations.
diag3_bs$sfd[diag3_bs$pathot=="pt2" & diag3_bs$pathon=="pn1" & diag3_bs$pathom=="m0"] <- 1 diag3_bs$sfd[diag3_bs$pathot=="pt1" & diag3_bs$pathon=="pn1" & diag3_bs$pathom=="m0"] <- 2 diag3_bs$sfd[diag3_bs$pathot=="pt3" & diag3_bs$pathon=="pn1" & diag3_bs$pathom=="m0"] <- 3... on upto 9. i want ask if there better more automated way of getting same result?
dput(data.frame) given below
structure(list(f_status = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "y", class = "factor"), event_id = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "baseline", class = "factor"), pag_name = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "br2", class = "factor"), ptsize = c(3, 4, 2.7, 2, 0.9, 3, 3, 0.9, 3, 4.5), ptsize_u = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = "cm", class = "factor"), pt_sym = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l), .label = c("", "-", "<", ">"), class = "factor"), pathot = structure(c(4l, 4l, 4l, 3l, 3l, 4l, 4l, 3l, 4l, 4l), .label = c("*pt4", "*ptis", "pt1", "pt2", "pt3"), class = "factor"), pathon = structure(c(2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l), .label = c("*pn2", "pn1" ), class = "factor"), pathom = structure(c(2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l), .label = c("*m1", "m0"), class = "factor"), rsubjid = 901000:901009, rusubjid = structure(1:10, .label = c( "000301-000-901-251", "000301-000-901-252", "000301-000-901-253", "000301-000-901-254", "000301-000-901-255", "000301-000-901-256", "000301-000-901-257", "000301-000-901-258", "000301-000-901-259", "000301-000-901-260", "000301-000-901-261", "000301-000-901-262") , class = "factor")), .names = c("f_status", "event_id", "pag_name", "ptsize", "ptsize_u", "pt_sym", "pathot", "pathon", "pathom", "rsubjid", "rusubjid"), row.names = c(na, 10l), class = "data.frame") thanks.
i tried edit data didn't throw error on input. created version of tabulation of possible combinations:
stg_tbl <- structure(list(pathot = structure(c(4l, 3l, 5l, 3l, 3l, 2l, 5l, 4l, 1l), .label = c("*pt4", "*ptis", "pt1", "pt2", "pt3"), class = "factor"), pathon = structure(c(2l, 2l, 2l, 1l, 2l, 2l, 1l, 2l, 2l), .label = c("*pn2", "pn1"), class = "factor"), pathom = structure(c(2l, 2l, 2l, 2l, 1l, 2l, 2l, 1l, 2l), .label = c("*m1", "m0"), class = "factor")), .names = c("pathot", "pathon", "pathom"), class = "data.frame", row.names = c("1", "4", "13", "161", "391", "810", "948", "1043", "1067")) make vector of text-equivalents of categories:
stg_lbls <- with(stg_tbl, paste(pathot, pathon, pathom, sep="_") ) then as.numeric values of factor created using levels desired result:
dat$stg <- with(dat, factor( paste(pathot, pathon, pathom, sep="_"), levels=stg_lbls)) as.numeric(dat$stg) #[1] 1 1 1 2 2 1 1 2 1 1 you can assign values in usual way:
dat$sfd <- as.numeric(dat$stg)
Comments
Post a Comment