1

Hey guys I'm a total newbie to creating cohorts and would really appreciate, if i receive any help (through python code or sql query)... lets assume the bike-taxi ride(similar to uber) data-set as below:

**Note:**columns relevant for the cohort creation are 'timestamp' and 'custid'. 
I tried referring to gregred( http://www.gregreda.com/2015/08/23/cohort-analysis-with-python/.) 
However am losing on the weekly analysis base. 


timestamp          custID     lat1      lng1        lat2        lng2

==================================================================

2018-04-07 7:07:17  14626   12.3136215  76.658195   12.287301   76.60228
2018-04-07 7:32:27  85490   12.943947   77.560745   12.954014   77.54377
2018-04-07 7:36:44  5408    12.899603   77.5873 12.93478    77.56995
2018-04-07 7:38:00  58940   12.918229   77.607544   12.968971   77.636375
2018-04-07 7:39:29  5408    12.89949    77.58727    12.93478    77.56995
2018-04-07 7:43:08  5408    12.899421   77.587326   12.93478    77.56995
2018-04-07 7:43:55  50266   12.898679   77.60434    12.877949   77.5959
2018-04-07 7:52:31  58940   12.918229   77.607544   12.968971   77.636375
2018-04-07 7:52:42  58940   12.918229   77.607544   12.968971   77.636375
2018-04-07 7:53:23  28126   12.91184    77.60225    12.940866   77.54071
2018-04-07 7:55:05  99251   12.87466    77.61951    12.896871   77.60847
2018-04-07 7:55:24  99251   12.87466    77.61951    12.896871   77.60847
2018-04-07 8:00:04  34808   12.989711   77.65381    12.939158   77.73467
2018-04-07 8:00:16  34808   12.989711   77.65381    12.939158   77.73467
2018-04-07 8:03:16  89714   12.868537   77.65304    12.972006   77.59487
2018-04-07 8:03:24  89714   12.868537   77.65304    12.972006   77.59487
2018-04-07 8:07:16  82060   12.987069   77.57703    12.970017   77.577934
2018-04-07 8:08:57  18815   12.933479   77.57087    12.961353   77.57457
2018-04-07 8:11:35  38288   12.886039   77.64894    12.902692   77.62253
2018-04-07 8:17:24  80401   12.990636   77.67494    12.962719   77.5876
2018-04-07 8:20:51  89225   12.99445    77.729546   12.980993   77.69732

from the above mocked dataset, the objective is to Calculate Week-on-week cohorts for customers taking rides in that week, an example cohort output is shown as follows.


Input query format: STARTING WEEK, NUMBER OF WEEKS 
Input Sample: Week 1, 3 weeks
Output Sample:

         week1   week2   week3

 week1    100     80      90
 week2            200     70
 week3                    100

Input query format: STARTING Date, NUMBER OF WEEKS
Input: 1/12/2019, 3 weeks
Output Sample:

             1/12/2019   8/12/2019   16/12/2019
1/12/2019    100           80           90
8/12/2019                  200          70
16/12/2019                              100


Interpretation:
● 100 customers were active on the week of 1st.
○ Out of which 80 came back on 8th week
○ Out of which 90 came back on 16th week
● 200 new customers were active on the week of 8th
○ Out of which 70 came back on week of 16th.

Is there any pythonic way (either through pandas) or query way (through sql) to calculate the cohort outputs in the above formats.

0 Answers0