-1

I have a table with columns ID, DATES, RATES, one ID (customer) has multiple rates but one can only have one rate on one specific date.

ID     DATES          RATES
 5     2014-07-01     0.02
 5     2014-07-02     0.03
 5     2014-07-03     0.04

Thus, one ID can only have one rate change path over time because one ID was assigned only one rate on a specific date. But there are some ID having exactly the same rate change path over time because the same rate can be assigned to multiple IDs. for example, ID 5 & 6 have the same rate change path over time.

ID    DATES          RATES
 5    2014-07-01     0.02
 5    2014-07-02     0.03
 5    2014-07-03     0.04
 6    2014-07-01     0.02
 6    2014-07-02     0.03
 6    2014-07-03     0.04

My goal is to find out customers with the same rate change paths over time (The ideal output would be, rate path 1 include ID 5 & 6. The ID columns include IDs having the same rate change path).

Rate Path                  ID
  1                        5,6
  2                        1,3

I am hoping to use R or sqldf package on Rstudio to solve this problem. Sample table shown as below. There is one point that needs to be pointed out, ID 1 and 3 share the same pattern even though ID 3 came in one day later than ID 1 but if you do a line chart about their rates, the two lines will be completely overlapped. however, ID 1 and ID 2 don't share the same pattern because even though for the first three days they share the same pattern but the last day their rates are different. The two lines wouldn't be overlapped.

ID       DATES         RATES
 1     2014-07-01       0.02
 1     2014-07-02       0.03
 1     2014-07-03       0.04
 1     2014-07-04       0.05
 2     2014-07-01       0.02
 2     2014-07-02       0.03
 2     2014-07-03       0.04
 2     2014-07-04       0.06
 3     2014-07-02       0.03
 3     2014-07-03       0.04
 3     2014-07-04       0.05

I am currently using pairwise sum of absolute difference to solve this issue. But the big data requires lots of calculation afterwards. I am thinking of any other better ways to solve this problem?

NightDog
  • 91
  • 7
  • 2
    sample data and expected result would help. – Rahul Jain May 19 '18 at 20:13
  • ID 1 has 4 rates whereas ID 3 has 3 rates. Are their paths to be regarded as identical because ID 3's path is a subpath of ID 1's path or is there an error in the question? – G. Grothendieck Jun 01 '18 at 12:43
  • Also what if one ID has rates 1,2,3 and another ID has rates 2,3,4 and another ID has rates 1,2,3,4? Are they all to be regarded as the same even though neither of the first two are contained the other? – G. Grothendieck Jun 01 '18 at 20:32
  • hi @G.Grothendieck ID 1 and ID3 are considered to be on the same path because ID3 is the subpath of ID1. but if one ID has rates 1,2,3 and another iD has rates 2,3,4 then they are considered different path. – NightDog Jun 04 '18 at 17:17

1 Answers1

0

You seem to be looking for one thing being an exact subset of another. One natural approach would be to use arrays. However, I don't know of a simple "contains" that matches both the order and the elements. So, I'll use strings:

with t as (
      select id,
             (',' || string_agg(rate, ',' order by date) || ',') as rates
      from yourtable t
      group by id
     )
select t1.*, t2.*
from t join
     t t2
     on t.rates like '%' || t2.rates || '%';
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786