python - Numpy random choice multiple loops -


i trying perform multiple simulations many times desired simulation distribution. have dataset looks 1 below.

fruit_type, reading, prob apple, 12,.05 apple, 15, .5 orange 18, .99 

an example of code below.

def sim(seconds):     output = pd.dataframe()     current = []     #output = pd.dataframe()     in range(1, 100000000):         if data2['fruit_type'].all() == 'apple':             hostrecord1 = np.random.choice(data2['reading'], size=23, replace=true, p=data2['prob'])             current = hostrecord1.sum() + 150          if data2['fruit_type'].all() == 'orange':             hostrecord2 = np.random.choice(data2['reading'], size=23, replace=true, p=data2['prob'])             current = hostrecord2.sum() + 150          if data2['fruit_type'].all() == 'peach':             hostrecord3 = np.random.choice(data2['reading'], size=20, replace=true, p=data2['prob'])             current = hostrecord3.sum() + 150      #put records in 1 array     #return records      output = pd.concat(current)     return output 

i trying figure out how perform multiple simulations different conditions varying fruit_type, can't figure out logic. each simulation should select specific rows in relation fruit_type simulations specified fruit_type part of it. size of each sample different design each fruit_type has different conditions.

my expected output array of simulation values. want append results 1 pandas dataframe.

your explanation pretty unclear, here's guess:

# initialize data in [1]: fruits = ['apple', 'peach', 'orange'] in [2]: data = np.vstack((np.random.choice(fruits, size=10),                            np.random.randint(0, 100, size=10),                            np.random.rand(10))).t in [3]: df = pd.dataframe(data, columns=['fruit_type', 'reading', 'prob']) 

the key indexing df such df[df.fruit_type == fruit_of_interest]. here sample function:

def simulate(df, n_trials):     # replace actual sizes ['apple', 'peach', 'orange'] respectively     sample_sizes = [n1, n2, n3]     fruits = ['apple', 'peach', 'orange']      results = np.empty((n_trials, len(fruits))     in xrange(n_trials): # switch range if using python3         j, (fruit, size) in enumerate(zip(fruits, sample_sizes)):             sim_data = df[df.fruit_type == fruit]             record = np.random.choice(sim_data.reading, size=size, p=sim_data.prob)             # record             results[i, j] = record.sum() 

note results array may big fit in memory if you're doing 100 million trials. may faster if swap loops fruit/size 1 outermost loop.


it's worth noting instead of for-looping, generate huge sample np.random.choice , reshape:

np.random.choice([0, 1], size=1000000).reshape(10000, 100) 

would give 10000 trials 100 samples each. useful if 100 million trials taking long -- split 100 loops choice doing 1 million samples @ once. example be

def simulate(df, n_trials, chunk_size=10000):     # replace actual sizes ['apple', 'peach', 'orange'] respectively     sample_sizes = [n1, n2, n3]     fruits = ['apple', 'peach', 'orange']      in xrange(n_trials/chunk_size): # switch range if using python3         chunk_results = np.empty((chunk_size, len(fruits))         j, (fruit, size) in enumerate(zip(fruits, sample_sizes)):             sim_data = df[df.fruit_type == fruit]             record = np.random.choice(sim_data.reading, size=(chunk_size, size),                                        p=sim_data.prob)             chunk_results[:, j] = record.sum(axis=1)          # intermediate chunk 

Comments