0

I have a set of businesses that have invoices from clients each day. I use this year-over-year to find how transactions have grown each year, i.e. 2 invoices Sep 2021 and 3 invoices Sep 2022 is (3-2)/2 = 50% yoy txn growth.

I want to find a method to discern (preferably in Python) if a business has significantly more or significantly less invoices each month than the average number of monthly transactions across all businesses.

I'm looking at the calculation for statistical power but unsure how to use it in this case.

My original data would look like:

business|time_value|txns|yoy_txn_growth
1111    |2022-02-01|10  |null
1111    |2023-02-01|11  |0.10
1111    |2022-03-01|10  |null
1111    |2023-03-01|12  |0.20
2222    |2022-02-01|10  |null
2222    |2023-02-01|13  |0.30
2222    |2022-03-01|10  |null
2222    |2023-03-01|14  |0.40
...

I'm looking to arrive at a meaningful answer of how many invoices and/or businesses need to exist to have a 0.05 significance. Not sure if I need to decide what difference in txns/yoy_txn_growth from the mean would be significant but it can be 1 standard deviation.

Could someone outline the steps I should follow for this usecase to derive what number of businesses and/or invoices I would need to find a meaningful result? The null hypothesis can either be 1) Practice X has significantly more/significantly less txns than the mean this month 2) Practice X has significantly more/significantly less yoy_txn_growth than the mean this month.

Mark McGown
  • 975
  • 1
  • 10
  • 26

0 Answers0