edit: added deft
does using pandas.cut change structure of pandas.dataframe.
i using pandas.cut in following manner map single age years age groups , aggregating afterwards. however, aggregation not work end nan in columns being aggregated. here code:
cutoff = numpy.hstack([numpy.array(deft.minage[0]), deft.maxage.values]) labels = deft.agegrp df['agegrp'] = pandas.cut(df.age, bins = cutoff, labels = labels, include_lowest = true) here deft:
agegrp maxage minage 1 18 14 2 21 19 3 24 22 4 34 25 5 44 35 6 54 45 7 65 55 then pass data-frame function aggregate:
grouped = df.groupby(['year', 'month', 'occid', 'agegrp', 'sex', \ 'race', 'hisp', 'educ'], as_index = false) final = grouped.aggregate(numpy.sum) if change ages age groups via manner works perfectly:
df['agegrp'] = 1 df.ix[(df.age >= 14) & (df.age <= 18), 'agegrp'] = 1 # age 16 - 20 df.ix[(df.age >= 19) & (df.age <= 21), 'agegrp'] = 2 # age 21 - 25 df.ix[(df.age >= 22) & (df.age <= 24), 'agegrp'] = 3 # age 26 - 44 df.ix[(df.age >= 25) & (df.age <= 34), 'agegrp'] = 4 # age 45 - 64 df.ix[(df.age >= 35) & (df.age <= 44), 'agegrp'] = 5 # age 64 - 85 df.ix[(df.age >= 45) & (df.age <= 54), 'agegrp'] = 6 # age 64 - 85 df.ix[(df.age >= 55) & (df.age <= 64), 'agegrp'] = 7 # age 64 - 85 df.ix[df.age >= 65, 'agegrp'] = 8 # age 85+ i prefer on fly, importing definition table , using pandas.cut, instead of being hard-coded.
thank in advance.
here is, perhaps, work-around.
consider following example replicates symptom describe:
import numpy np import pandas pd np.random.seed(2015) deft = pd.dataframe({'agegrp': [1, 2, 3, 4, 5, 6, 7], 'maxage': [18, 21, 24, 34, 44, 54, 65], 'minage': [14, 19, 22, 25, 35, 45, 55]}) cutoff = np.hstack([np.array(deft['minage'][0]), deft['maxage'].values]) labels = deft['agegrp'] n = 50 df = pd.dataframe(np.random.randint(100, size=(n,2)), columns=['age', 'year']) df['agegrp'] = pd.cut(df['age'], bins=cutoff, labels=labels, include_lowest=true) grouped = df.groupby(['year', 'agegrp'], as_index=false) final = grouped.agg(np.sum) print(final) # year agegrp age # year agegrp # 3 1 nan nan nan # 2 nan nan nan # ... # 97 1 nan nan nan # 2 nan nan nan # [294 rows x 3 columns] if change
grouped = df.groupby(['year', 'agegrp'], as_index=false) final = grouped.agg(np.sum) to
grouped = df.groupby(['year', 'agegrp'], as_index=true) final = grouped.agg(np.sum).dropna() print(final) then obtain:
age year agegrp 6 7 61 16 4 32 18 1 34 25 3 23 28 5 39 34 7 60 35 5 42 38 4 25 40 2 19 53 7 59 56 4 25 5 35 66 6 54 67 7 55 70 7 56 73 6 51 80 5 36 81 6 46 85 5 38 90 7 58 97 1 18
Comments
Post a Comment