i have interesting data not uniform. group of items listed under category name, in same column. need add column row corresponding item's category belongs (then remove category heading). way distinguish new category determining whether value under year empty.... dputs should explain issue more clearly.
before:
structure(list(x = structure(c(13l, 1l, 19l, 16l, 5l, 17l, 11l, 8l, 2l, 10l, 4l, 6l, 18l, 15l, 21l, 12l, 14l, 9l, 3l, 20l, 7l ), .label = c("-burgers", "-cameras", "-shirts", "+laptops", "+salads", "+tvs", "caps", "cell", "clothes:", "desktops", "electronics", "flowers", "food", "garden nomes", "grills", "hotdogs", "nachoes", "outdoors:", "pizza", "shorts", "swimming gear"), class = "factor"), x2000 = c(na, 104l, 159l, 184l, 189l, 182l, na, 49l, 28l, 46l, 34l, 43l, na, 129l, 190l, 189l, 119l, na, 45l, 80l, 80l), x2001 = c(na, 147l, 192l, 164l, 174l, 196l, na, 40l, 34l, 43l, 35l, 22l, na, 114l, 130l, 120l, 145l, na, 56l, 35l, 54l), x2002 = c(na, 163l, 172l, 138l, 146l, 190l, na, 38l, 40l, 21l, 22l, 33l, na, 186l, 172l, 139l, 119l, na, 88l, 78l, 91l), x2003 = c(na, 125l, 152l, 182l, 148l, 125l, na, 36l, 44l, 34l, 27l, 50l, na, 119l, 115l, 188l, 166l, na, 91l, 77l, 77l), x2004 = c(na, 116l, 111l, 120l, 153l, 199l, na, 49l, 48l, 43l, 37l, 32l, na, 159l, 116l, 143l, 153l, na, 18l, 53l, 51l)), .names = c("x", "x2000", "x2001", "x2002", "x2003", "x2004"), class = "data.frame", row.names = c(na, -21l)) after:
structure(list(x = structure(c(1l, 15l, 13l, 5l, 14l, 8l, 2l, 9l, 4l, 6l, 12l, 17l, 10l, 11l, 3l, 16l, 7l), .label = c("-burgers", "-cameras", "-shirts", "+laptops", "+salads", "+tvs", "caps", "cell", "desktops", "flowers", "garden nomes", "grills", "hotdogs", "nachoes", "pizza", "shorts", "swimming gear"), class = "factor"), x.1 = structure(c(3l, 3l, 3l, 3l, 3l, 2l, 2l, 2l, 2l, 2l, 4l, 4l, 4l, 4l, 1l, 1l, 1l), .label = c("clothes:", "electronics", "food", "outdoors:"), class = "factor"), x2000 = c(104l, 159l, 184l, 189l, 182l, 49l, 28l, 46l, 34l, 43l, 129l, 190l, 189l, 119l, 45l, 80l, 80l), x2001 = c(147l, 192l, 164l, 174l, 196l, 40l, 34l, 43l, 35l, 22l, 114l, 130l, 120l, 145l, 56l, 35l, 54l), x2002 = c(163l, 172l, 138l, 146l, 190l, 38l, 40l, 21l, 22l, 33l, 186l, 172l, 139l, 119l, 88l, 78l, 91l), x2003 = c(125l, 152l, 182l, 148l, 125l, 36l, 44l, 34l, 27l, 50l, 119l, 115l, 188l, 166l, 91l, 77l, 77l), x2004 = c(116l, 111l, 120l, 153l, 199l, 49l, 48l, 43l, 37l, 32l, 159l, 116l, 143l, 153l, 18l, 53l, 51l)), .names = c("x", "x.1", "x2000", "x2001", "x2002", "x2003", "x2004"), class = "data.frame", row.names = c(na, -17l )) the items arbitrarily have + or - signs...i need remain same. also, category headers have : while others not.
we create index based on 'na' values in columns other 1st ('indx'). split dataset using 'indx', remove first row i.e. na values columns 2nd last, cbind 1st row, 1st column value, rearrange columns , rbind.
indx <- cumsum(!rowsums(!is.na(df1[-1]))) res <- do.call(rbind,lapply(split(df1, indx), function(x) cbind(x, x.1= x[1,1])[-1,c(1,7,2:6)])) row.names(res) <- null all.equal(res, out, check.attributes=false) #[1] true where 'out' dput output of expected result
update
if columns have '' instead of na,
indx <- cumsum(!rowsums(df1[-1]!='')) and rest above. having said that, when have '' in numeric column, class either factor or character based on whether specify stringsasfactors=false or =true in read.table/read.csv. so, keeping '' such output factor/character class. convert columns correct class first coerce '' na, i.e.
df1[-1] <- lapply(df1[-1], function(x) as.numeric(as.character(x))) the as.character needed if columns factor class.
once, have done conversion, first approach should work fine well.
Comments
Post a Comment