Python pandas.cut -


edit: added deft

does using pandas.cut change structure of pandas.dataframe.

i using pandas.cut in following manner map single age years age groups , aggregating afterwards. however, aggregation not work end nan in columns being aggregated. here code:

cutoff = numpy.hstack([numpy.array(deft.minage[0]),   deft.maxage.values]) labels = deft.agegrp  df['agegrp'] = pandas.cut(df.age,                            bins              = cutoff,                            labels            = labels,                            include_lowest    = true) 

here deft:

agegrp  maxage  minage    1      18      14    2      21      19    3      24      22    4      34      25    5      44      35    6      54      45    7      65      55 

then pass data-frame function aggregate:

grouped = df.groupby(['year', 'month', 'occid', 'agegrp', 'sex', \                       'race', 'hisp', 'educ'],                        as_index = false)  final   = grouped.aggregate(numpy.sum) 

if change ages age groups via manner works perfectly:

df['agegrp'] = 1 df.ix[(df.age >= 14) & (df.age <= 18), 'agegrp'] = 1 # age 16 - 20 df.ix[(df.age >= 19) & (df.age <= 21), 'agegrp'] = 2 # age 21 - 25   df.ix[(df.age >= 22) & (df.age <= 24), 'agegrp'] = 3 # age 26 - 44   df.ix[(df.age >= 25) & (df.age <= 34), 'agegrp'] = 4 # age 45 - 64   df.ix[(df.age >= 35) & (df.age <= 44), 'agegrp'] = 5 # age 64 - 85   df.ix[(df.age >= 45) & (df.age <= 54), 'agegrp'] = 6 # age 64 - 85   df.ix[(df.age >= 55) & (df.age <= 64), 'agegrp'] = 7 # age 64 - 85   df.ix[df.age >= 65, 'agegrp'] = 8 # age 85+ 

i prefer on fly, importing definition table , using pandas.cut, instead of being hard-coded.

thank in advance.

here is, perhaps, work-around.

consider following example replicates symptom describe:

import numpy np import pandas pd np.random.seed(2015)  deft = pd.dataframe({'agegrp': [1, 2, 3, 4, 5, 6, 7],                      'maxage': [18, 21, 24, 34, 44, 54, 65],                      'minage': [14, 19, 22, 25, 35, 45, 55]})  cutoff = np.hstack([np.array(deft['minage'][0]), deft['maxage'].values]) labels = deft['agegrp']  n = 50 df = pd.dataframe(np.random.randint(100, size=(n,2)), columns=['age', 'year']) df['agegrp'] = pd.cut(df['age'], bins=cutoff, labels=labels, include_lowest=true)  grouped = df.groupby(['year', 'agegrp'], as_index=false) final = grouped.agg(np.sum) print(final) #              year  agegrp  age # year agegrp                    # 3    1        nan     nan  nan #      2        nan     nan  nan # ... # 97   1        nan     nan  nan #      2        nan     nan  nan # [294 rows x 3 columns] 

if change

grouped = df.groupby(['year', 'agegrp'], as_index=false) final = grouped.agg(np.sum) 

to

grouped = df.groupby(['year', 'agegrp'], as_index=true) final = grouped.agg(np.sum).dropna() print(final) 

then obtain:

             age year agegrp      6    7        61 16   4        32 18   1        34 25   3        23 28   5        39 34   7        60 35   5        42 38   4        25 40   2        19 53   7        59 56   4        25      5        35 66   6        54 67   7        55 70   7        56 73   6        51 80   5        36 81   6        46 85   5        38 90   7        58 97   1        18 

Comments