How to get a weighted average of a list of which it's weights is limited by a variable in Python 3.6

Question

I hope the title makes sense. What i'm trying to achieve is getting a weighted average price of shoes which are available at different prices in different amounts. So I have for example:

list_prices = [12,12.7,13.5,14.3]
list_amounts = [85,100,30,54]
BuyAmount = x

I want to know my weighted average price, and the highest price I paid per shoe If I buy x amount of shoes (assuming I want to buy the cheapest first)

This is what I have now (I use numpy):

    if list_amounts[0] >= BuyAmount:
        avgprice = list_prices[0]
        highprice = list_prices[0]

    elif (sum(list_amounts[0: 2])) >= BuyAmount:
        avgprice = np.average(list_prices[0: 2], weights=[list_amounts[0],BuyAmount - list_amounts[0]])
        highprice = list_prices[1]

    elif (sum(list_amounts[0: 3])) >= BuyAmount:
        avgprice = np.average(list_prices[0: 3], weights=[list_amounts[0],list_amounts[1],BuyAmount - (sum(list_amounts[0: 2]))])
        highprice = list_prices[2]

    elif (sum(list_amounts[0: 4])) >= BuyAmount:
        avgprice = np.average(list_prices[0: 4], weights=[list_amounts[0],list_amounts[1],list_amounts[2],BuyAmount - (sum(list_amounts[0: 3]))])
        highprice = list_prices[3]

    print(avgprice)
    print(highprice)

This code works, but is probably overly complex and expansive. Especially since I want to able to handle amount and price lists with 20+ items.

What is a better way to do this?

score 4 · Answer 1 · answered Jan 09 '18 at 13:47

You are indeed right, your code lacks flexibility. But in my opinion, you are looking at the problem from a valid perspective, yet not enough general.

In other words, your solution has this idea implemented: "let me check first - given the quantities available for each price (which I sorted beautifully in an array) - what are the different sellers I have to buy from, then do all the computation."

A more flexible idea can be: "Let me start buying the from cheaper available, as much as I can. I will stop when my order is fulfilled, and compute the math step by step". This means you build an iterative code, cumulating step by step the overall spent amount, and once completed computing the average price per piece and the max price (i.e. the last accessed in your ordered list).

To turn into code to this thoughts:

list_prices = [12,12.7,13.5,14.3]
list_amounts = [85,100,30,54]
BuyAmount = x

remaining = BuyAmount
spent_total = 0
current_seller = -1 # since we increment it right away 

while(remaining): # inherently means remaining > 0
    current_seller += 1
    # in case we cannot fulfill the order
    if current_seller >= len(list_prices):
        # since we need it later we have to restore the value
        current_seller -= 1
        break
    # we want either as many as available or just enough to complete 
    # BuyAmount
    buying = min([list_amounts[current_seller], remaining])
    # update remaining
    remaining -= buying
    # update total
    spent_total += buying * list_prices[current_seller]

# if we got here we have no more remaining or no more stock to buy

# average price
avgprice = spent_total / (BuyAmount - remaining) 

# max price - since the list is ordered -
highprice = list_prices[current_seller]

print(avgprice)
print(highprice)

Thank you very much for your reply. It's good to hear that my train of thought was correct. That said, i didn't even think of programming it in a away where it only had to look at the amounts that were relevant. I implemented it like this and it works! I also learnt a lot of looking at your code and how way more efficient it is than mine :) thanks again — Cennnn, Jan 09 '18 at 17:01

Divakar · Accepted Answer · 2018-01-09T14:07:27.277

Here's a generic vectorized solution using cumsum to replace those sliced summations and argmax for getting the appropriate index to be used for setting the slice limits for those IF-case operations -

# Use cumsum to replace sliced summations - Basically all those 
# `list_amounts[0]`, `sum(list_amounts[0: 2]))`, `sum(list_amounts[0: 3])`, etc.
c = np.cumsum(list_amounts)

# Use argmax to decide the slicing limits for the intended slicing operations.
# So, this would replace the last number in the slices - 
# list_prices[0: 2], list_prices[0: 3], etc.
idx = (c >= BuyAmount).argmax()

# Use the slicing limit to get the slice off list_prices needed as the first
# input to numpy.average
l = list_prices[:idx+1]

# This step gets us the weights. Now, in the weights we have two parts. E.g.
# for the third-IF we have : 
# [list_amounts[0],list_amounts[1],BuyAmount - (sum(list_amounts[0: 2]))]
# Here, we would slice off list_amounts limited by `idx`.
# The second part is sliced summation limited by `idx` again.
w = np.r_[list_amounts[:idx], BuyAmount - c[idx-1]]

# Finally, plug-in the two inputs to np.average and get avgprice output.
avgprice = np.average(l,weights=w)

# Get idx element off list_prices as the highprice output.
highprice = list_prices[idx]

We can further optimize to remove the concatenation step ( with np.r_) and get to avgprice, like so -

slice1_sum = np.multiply(list_prices[:idx], list_amounts[:idx]).sum()
        # or np.dot(list_prices[:idx], list_amounts[:idx])
slice2_sum = list_prices[idx]*(BuyAmount - c[idx-1])
weight_sum = np.sum(list_amounts[:idx]) + BuyAmount - c[idx-1]
avgprice = (slice1_sum+slice2_sum)/weight_sum

using numpy is most of time more efficient than plain python, but the code looks often obscure: can you please add some notes and explanation? — Fabio Veronese, Jan 09 '18 at 13:56
Thank you so much for this help. I had to do a lot of research because I'm not familiar with cumsum, argmax and np_r. The cumsum really cuts down the length of the code a lot haha, it's exactly what you need indeed. How you find the amountindex where the amounts fill the order is quite handy aswell, and in the end you have a very efficient code! I do have one question, why is the optimization of removing the concatenation step such a good thing? Thank you again for your help, learnt a lot! — Cennnn, Jan 09 '18 at 17:06
@Cennnn The concatenation needs extra memory to store the concatenated array. By replacing it with in-situ operations as shown in the latter part, we are saving on memory and that hopefully should lead to performance efficiency as well. — Divakar, Jan 09 '18 at 19:08

How to get a weighted average of a list of which it's weights is limited by a variable in Python 3.6

2 Answers2