i trying perform multiple simulations many times desired simulation distribution. have dataset looks 1 below.
fruit_type, reading, prob apple, 12,.05 apple, 15, .5 orange 18, .99 an example of code below.
def sim(seconds): output = pd.dataframe() current = [] #output = pd.dataframe() in range(1, 100000000): if data2['fruit_type'].all() == 'apple': hostrecord1 = np.random.choice(data2['reading'], size=23, replace=true, p=data2['prob']) current = hostrecord1.sum() + 150 if data2['fruit_type'].all() == 'orange': hostrecord2 = np.random.choice(data2['reading'], size=23, replace=true, p=data2['prob']) current = hostrecord2.sum() + 150 if data2['fruit_type'].all() == 'peach': hostrecord3 = np.random.choice(data2['reading'], size=20, replace=true, p=data2['prob']) current = hostrecord3.sum() + 150 #put records in 1 array #return records output = pd.concat(current) return output i trying figure out how perform multiple simulations different conditions varying fruit_type, can't figure out logic. each simulation should select specific rows in relation fruit_type simulations specified fruit_type part of it. size of each sample different design each fruit_type has different conditions.
my expected output array of simulation values. want append results 1 pandas dataframe.
your explanation pretty unclear, here's guess:
# initialize data in [1]: fruits = ['apple', 'peach', 'orange'] in [2]: data = np.vstack((np.random.choice(fruits, size=10), np.random.randint(0, 100, size=10), np.random.rand(10))).t in [3]: df = pd.dataframe(data, columns=['fruit_type', 'reading', 'prob']) the key indexing df such df[df.fruit_type == fruit_of_interest]. here sample function:
def simulate(df, n_trials): # replace actual sizes ['apple', 'peach', 'orange'] respectively sample_sizes = [n1, n2, n3] fruits = ['apple', 'peach', 'orange'] results = np.empty((n_trials, len(fruits)) in xrange(n_trials): # switch range if using python3 j, (fruit, size) in enumerate(zip(fruits, sample_sizes)): sim_data = df[df.fruit_type == fruit] record = np.random.choice(sim_data.reading, size=size, p=sim_data.prob) # record results[i, j] = record.sum() note results array may big fit in memory if you're doing 100 million trials. may faster if swap loops fruit/size 1 outermost loop.
it's worth noting instead of for-looping, generate huge sample np.random.choice , reshape:
np.random.choice([0, 1], size=1000000).reshape(10000, 100) would give 10000 trials 100 samples each. useful if 100 million trials taking long -- split 100 loops choice doing 1 million samples @ once. example be
def simulate(df, n_trials, chunk_size=10000): # replace actual sizes ['apple', 'peach', 'orange'] respectively sample_sizes = [n1, n2, n3] fruits = ['apple', 'peach', 'orange'] in xrange(n_trials/chunk_size): # switch range if using python3 chunk_results = np.empty((chunk_size, len(fruits)) j, (fruit, size) in enumerate(zip(fruits, sample_sizes)): sim_data = df[df.fruit_type == fruit] record = np.random.choice(sim_data.reading, size=(chunk_size, size), p=sim_data.prob) chunk_results[:, j] = record.sum(axis=1) # intermediate chunk
Comments
Post a Comment