Following on from this question, here is the input / output I want. I have some ideas though not entirely sure..
How do i detect duplicates and then among them cross check if two columns have similar values?
So I have a data frame like this.
No fname sname landline address time_of_move_in
1 Alphred Thomas 123 A 19/10/2016,00:01:00
2 Peter Jay 345 B 29/10/2016,00:01:00
3 Donald Hook 123 A 30/10/2016,00:11:00
4 Jay Donald 345 B 29/10/2016,00:05:00
5 Jay Donald 123 A 30/10/2016,00:14:00
6 Haskell Peter 123 B 19/10/2016,00:01:00
What i want is an output like this
Case_Number fname sname landline address time_diff
1 Peter Jay 345 B -4 Hours
1 Jay Donald 345 B 4 Hours
2 Donald Hook 123 A -2 Hours
2 Jay Donald 123 A 2 Hours
Eventually I only want to filter out any case where the time diff between the two discovered is < 3 hours.
Criteria Between any two cases detected
Landline and Address should be identical
If above is true then the same name must repeat in either the fname or surname between the two rows detected. (In case 1 above it is Jay and in Case 2 above it is Donald. Note if Donald repeats twice in fname then this is not a valid case)
Time difference between the two <3 hours, i would like to bring directionality of time here as well eventually hence the negatives in the output set above.
Note : We do not have to display the time difference in the above format. So long as its some number/time format that is fine