Hey guys I'm a total newbie to creating cohorts and would really appreciate, if i receive any help (through python code or sql query)... lets assume the bike-taxi ride(similar to uber) data-set as below:
**Note:**columns relevant for the cohort creation are 'timestamp' and 'custid'.
I tried referring to gregred( http://www.gregreda.com/2015/08/23/cohort-analysis-with-python/.)
However am losing on the weekly analysis base.
timestamp custID lat1 lng1 lat2 lng2
==================================================================
2018-04-07 7:07:17 14626 12.3136215 76.658195 12.287301 76.60228
2018-04-07 7:32:27 85490 12.943947 77.560745 12.954014 77.54377
2018-04-07 7:36:44 5408 12.899603 77.5873 12.93478 77.56995
2018-04-07 7:38:00 58940 12.918229 77.607544 12.968971 77.636375
2018-04-07 7:39:29 5408 12.89949 77.58727 12.93478 77.56995
2018-04-07 7:43:08 5408 12.899421 77.587326 12.93478 77.56995
2018-04-07 7:43:55 50266 12.898679 77.60434 12.877949 77.5959
2018-04-07 7:52:31 58940 12.918229 77.607544 12.968971 77.636375
2018-04-07 7:52:42 58940 12.918229 77.607544 12.968971 77.636375
2018-04-07 7:53:23 28126 12.91184 77.60225 12.940866 77.54071
2018-04-07 7:55:05 99251 12.87466 77.61951 12.896871 77.60847
2018-04-07 7:55:24 99251 12.87466 77.61951 12.896871 77.60847
2018-04-07 8:00:04 34808 12.989711 77.65381 12.939158 77.73467
2018-04-07 8:00:16 34808 12.989711 77.65381 12.939158 77.73467
2018-04-07 8:03:16 89714 12.868537 77.65304 12.972006 77.59487
2018-04-07 8:03:24 89714 12.868537 77.65304 12.972006 77.59487
2018-04-07 8:07:16 82060 12.987069 77.57703 12.970017 77.577934
2018-04-07 8:08:57 18815 12.933479 77.57087 12.961353 77.57457
2018-04-07 8:11:35 38288 12.886039 77.64894 12.902692 77.62253
2018-04-07 8:17:24 80401 12.990636 77.67494 12.962719 77.5876
2018-04-07 8:20:51 89225 12.99445 77.729546 12.980993 77.69732
from the above mocked dataset, the objective is to Calculate Week-on-week cohorts for customers taking rides in that week, an example cohort output is shown as follows.
Input query format: STARTING WEEK, NUMBER OF WEEKS
Input Sample: Week 1, 3 weeks
Output Sample:
week1 week2 week3
week1 100 80 90
week2 200 70
week3 100
Input query format: STARTING Date, NUMBER OF WEEKS
Input: 1/12/2019, 3 weeks
Output Sample:
1/12/2019 8/12/2019 16/12/2019
1/12/2019 100 80 90
8/12/2019 200 70
16/12/2019 100
Interpretation:
● 100 customers were active on the week of 1st.
○ Out of which 80 came back on 8th week
○ Out of which 90 came back on 16th week
● 200 new customers were active on the week of 8th
○ Out of which 70 came back on week of 16th.
Is there any pythonic way (either through pandas) or query way (through sql) to calculate the cohort outputs in the above formats.