0

I have a dataset where I know how many units of each product I have in starting inventory. Then I know how many units of a given product were sold. I also know how many units of all other products were sold. The question I'm trying to answer is were the total number of units sold of a particular product significantly higher than I would expect based on the products percentage of starting inventory. I've read the documentation on proportions_ztest. It talks about numbers of observations, so I want to check if I'm using it correctly for units sold. With the code below I'm trying to get the p-value.

sold= total number of units sold of product1

tot_sld= total number of units sold including all products

perc_strt= (total number of units of product1 in starting inventory)/(total number of units from all products in starting invetory)

code:

import statsmodels.api as sm   

sm.stats.proportions_ztest(x['sold'], 
                              x['tot_sld'],
                              x['perc_strt'], 
                              alternative='larger')[1]

Update Example:

product1 start inventory=20 units

product2 start inventory=30 units

prodcut3 start inventory=50 units


product1 perc_strt=20%

number of units sold of product1=10 units
number of units sold of product2=10 units
number of units sold of product3=20 units

tot_sld=40 units

so

x['sold']=10
x['tot_sld']=40
x['perc_strt']=0.2

Update:

the one population proportion test from this post seems to confirm my original approach

https://towardsdatascience.com/demystifying-hypothesis-testing-with-simple-python-examples-4997ad3c5294

user3476463
  • 3,967
  • 22
  • 57
  • 117
  • Do all units sold come from the starting inventory or is the inventory replenished during the selling periods? AFAIU, the second argument `x['tot_sld']` should be the total number of units of each product whether sold or unsold, and not the total over all products. If there is no total number of units (nobs) for each product, then Poisson might be the more appropriate model than binomial. – Josef Jan 02 '20 at 17:16
  • @Josef thank you for getting back to me. Yes all the units sold come from the starting inventory, and no there isn't any replenishment. I'm trying to compare proportions between the two groups (product1 units in starting and product1 units in sold), with the null hypothesis being no difference in proportion and the alternative that the percentage in sold is higher? So why would I use the total starting inventory for the nobs, wouldn't the nobs be the total units of everything sold? (sold=sample, inventory=general population) – user3476463 Jan 03 '20 at 02:07
  • AFAIU: the proportion is `number of sales of i / number of i in inventory` for each unit `i` and the null hypothesis is that this proportion is constant across products, then `nobs = list(number of i in inventory for all i)` is the population of product_i at risk of being sold.. – Josef Jan 03 '20 at 03:58
  • @Josef thank you. I've added an update with a basic example, to make things clearer. The proportions of products in starting inventory isn't the same. What I'm compairing is the proportion of product1 in starting inventory to the proportion of product1 in the sold products. I want to know if the proportion of product1 in the "sold" sample is significantly higher than the proportion of product1 in the starting inventory. Am I using the proportions_ztest correctly to answer this question? If not could you please use the basic example to demonstrate how to use proportions_ztest? – user3476463 Jan 03 '20 at 17:41

0 Answers0