i understand can generate bins arrays numpy using numpy.histogram or numpy.digitize, , have in past. however, analysis need requires there number of samples in each bin, data not uniformly distributed.
say have approximately distributed data in array, a = numpy.random.random(1000). how can bin data (either creating index or finding values define extents of each bin) in way there number of samples in each?
i know can treated optimization problem, , can solve such, i.e.:
import numpy np scipy.optimize import fmin def generate_even_bins(a, n): x0 = np.linspace(a.min(), a.max(), n) def bin_counts(x, a): if any(np.diff(x)) <= 0: return 1e6 else: binned = np.digitize(a, x) return np.abs(np.diff(np.bincount(binned))).sum() return fmin(bin_counts, x0, args=(a,)) ... there available, perhaps in numpy or scipy.stats implements this? if not shouldn't there be?
Comments
Post a Comment