Weighted average with condition

Question

I have an API call that returns an array with 'participant_timestamp', 'price', and 'size'. I am looking for help in calculating a weighted average based on subsets of different time periods within this array. I've been trying to utilize ma.masked_outside with two 'participant_timestamp' values as the beginning and end of the time period subset, but I'm getting nowhere. Alternatively, I could use multiple API calls and adjust the time frame parameters for each call, but I'm hoping there's a more efficient way to do the former.

##Get Weighted Average Across Full Time Period##
api_data = ##API call specifics##
df = pd.DataFrame(api_data)
weighted_average_all_data = np.average(df['price'], weights=df['size'])
print(weighted_average_all_data)

##Attempt at ma.masked_outside##
new_array = ma.masked_outside(df['participant_timestamp'],15847272000000000000,15847266000000000000)
new_wtd_avg = np.average(new_array['price'], weights=new_array['size'])
print(new_wtd_avg)

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

ma.masked_outside returns a masked_array object, but you are trying to index it on the next line using strings. That's likely what is throwing the IndexError. You can read more about the masked_array object here https://docs.scipy.org/doc/numpy-1.15.0/reference/maskedarray.baseclass.html#numpy.ma.MaskedArray . Is there a reason you are not just doing the query with pandas like `df.loc[(df.participant_timestamp < some_value) & (df.participant_timestamp > some_value)]` ? — Adam Johnston, Mar 25 '20 at 05:04
@AdamJohnston Thanks, your suggestion is what I needed! To your question as to why I wasn't using df.loc[...] the answer is that I am new to Python and didn't know of it. My Googling head lead me down a path where ma.masked_outside seemed like the right path, but your suggested solution is much better. — Jim, Mar 25 '20 at 14:59

Weighted average with condition

0 Answers0