For a project I am working on I need to calculate average price of products for shops. Every time a shop changes the price of a product a new entry is added to the dataset. If a shop stops (temporarily or permanently) to sell a product, an entry is made with the time stamp and a price value of -1. Example:
timestamp shop product price
2014-01-01 10:07:32 E 4 19.99
2014-01-01 10:07:32 F 5 54.00
2014-01-02 14:41:12 A 1 28.00
2014-01-02 14:41:12 D 3 249.99
2014-01-02 15:12:38 C 1 29.99
2014-01-03 14:05:12 B 2 43.00
2014-01-05 12:21:57 F 5 49.99
2014-01-06 23:55:32 F 5 -1
2014-01-07 03:05:12 B 2 39.99
2014-01-07 11:24:49 D 3 -1
2014-01-08 11:35:33 C 2 40.99
2014-01-08 16:28:07 F 5 65.00
2014-01-12 21:41:04 E 3 199.00
Test cases:
- Shop: A that has no price entry for product 1 in the time period to calculate
- Shop B that has product 2 switch prices within the period
- Shop C that start selling product 2 in the period, and sells product 1 all through
- Shop D that stops selling product 3 in the period.
- Shop E that starts selling product 3 after the period and sells product 4 throughout
- Shop F that, for product 5 changes price, then stop selling, then starts again with a new price, all within the period
The period to fint the averages is from 2014-01-05 00:00:00 to 2014-01-10 23:59:59
What I need to do is calculate the average price within a certain period for a certain shop, and overall. That is the average is time weighted (3 days a price of 3 and 1 day price 1 is an avg of 2.5 not 1.5 for those 4 days). I have two problems:
- The starting value can be missing. The last price change i most likely in the begging of the time period being calculated, so I need to find a way to fill it in so that it will be used in the avg. In fact it possible that this is the only price in the whole period.
- Calculating with -1 will give the wrong results. The value should be ignored, and the overal time delta should be reduced with the time the product is no longer available.
The expected output for the data given above is (prices rounded up to the nearest cent):
shop product price
A 1 28.00
B 2 41.06
C 1 29.99
C 2 40.99
D 3 249.99
E 4 19.99
F 5 53.81
I have tried using numpy.ma to mask out the -1 values. However I have been unsuccessful in doing this as isnan
and masked_less
cannot handle this.
Any idea as to how I can achieve this?
Edit: Edited test data en expected results to more clearly reflect the problem