I have a table with columns ID
, DATES
, RATES
, one ID
(customer) has multiple rates
but one can only have one rate
on one specific date
.
ID DATES RATES
5 2014-07-01 0.02
5 2014-07-02 0.03
5 2014-07-03 0.04
Thus, one ID
can only have one rate change path over time because one ID
was assigned only one rate
on a specific date
. But there are some ID
having exactly the same rate change path over time because the same rate
can be assigned to multiple ID
s. for example, ID
5 & 6 have the same rate change path over time.
ID DATES RATES
5 2014-07-01 0.02
5 2014-07-02 0.03
5 2014-07-03 0.04
6 2014-07-01 0.02
6 2014-07-02 0.03
6 2014-07-03 0.04
My goal is to find out customers with the same rate change paths over time (The ideal output would be, rate path 1 include ID
5 & 6. The ID
columns include ID
s having the same rate change path).
Rate Path ID
1 5,6
2 1,3
I am hoping to use R or sqldf package on Rstudio to solve this problem. Sample table shown as below. There is one point that needs to be pointed out, ID 1 and 3 share the same pattern even though ID 3 came in one day later than ID 1 but if you do a line chart about their rates, the two lines will be completely overlapped. however, ID 1 and ID 2 don't share the same pattern because even though for the first three days they share the same pattern but the last day their rates are different. The two lines wouldn't be overlapped.
ID DATES RATES
1 2014-07-01 0.02
1 2014-07-02 0.03
1 2014-07-03 0.04
1 2014-07-04 0.05
2 2014-07-01 0.02
2 2014-07-02 0.03
2 2014-07-03 0.04
2 2014-07-04 0.06
3 2014-07-02 0.03
3 2014-07-03 0.04
3 2014-07-04 0.05
I am currently using pairwise sum of absolute difference to solve this issue. But the big data requires lots of calculation afterwards. I am thinking of any other better ways to solve this problem?