elements - Identify entries / records that appear in table() R -


i'm trying identify 'unique' , 'near unique' cases or records dataset disclosure control project. particularly combinations on variables appear once, twice etc.

the records appear in:

table(age,sex,ethnicity) 

i interested in elements (which true) of:

table(age,sex,ethnicity)==1  table(age,sex,ethnicity)==2  

i know there 150 cases looking from:

sum(table(age,sex,ethnicity)==1) 

there identifier in dataset nice output or number 1:length(age)*length(sex)*length(ethnicity) good. hoping return list like:

[1] 103 207 218....    [41] * * * [81] * * * 

where 'identifier' = 103, 207 , 218 first 3 of 150 cases where:

table(age,sex,ethnicity)==1 

i naively hoping like:

data$identifier[table(age,sex,mar,emp,edu) == 1]     names(table(age,sex,ethnicity)  

would work no such luck. i've looked unique() returns every combination (that occurs once or more). or input appreciated.

added reproducible example (hopefully) example

set.seed(1234) <- 1+rpois(100,1) b <- 1+rpois(100,1) c <- 1+rpois(100,1) a[a >= 5] <- 4 b[b >= 5] <- 4 c[c >= 5] <- 4 eg <- cbind(1:100,a,b,c) (sum(table(a,b,c)==1)) 

should have 12 'unique' combinations, identify using first column of eg (or identifier dataset)

i think easiest way using data.table package:

library(data.table) eg.dt <- as.data.table(eg) eg.dt[, list(n=.n), by=.(a,b,c)][n==1] 

how works: eg.dt[, list(n=.n), by=.(a,b,c)] counts number of occurences of each (a,b,c) combination. [n==1] filters out occur precisely once.

or if want stick dataframes (not data.table) try plyr:

library(plyr) eg <- data.frame(eg) subset(ddply(eg, .(a, b, c), nrow), v1 == 1) 

this works in same way: ddply(eg, .(a, b, c), nrow) makes dataframe column "v1" being number of times combination occurs; subset combinations occur once only.

i think there might way table(a,b,c) can't think of 1 isn't convoluted.


Comments