Recursive CTE / transitive closure in pandas

Question

My scenario:

User A is (fraudster).
User B is not (fraudster). However, the system will not allow user B to do any action. Because B and A are using the same Phone Number(Shared attribute with Fraud User). (1 layer).
User D is not (fraudster). But D is using the same Deviceid with B and B is sharing attribute with fraud User. Then block User D as well. In this case, there are 2 layers. D compare with B, B compares with A.

I can do that using Recursive CTE. However, My supervisor asked me to find an alternative way for that :(.

Recursive CTE Code:

with recursive cte as (
      select ID, Email, MobileNo, DeviceId, IPAddress, id as tracking
      from tableuser
      where isfraudsterstatus = 1
      union all
      select u.id, u.email, u.mobileno, u.deviceid, u.ipaddress , concat_ws(',', cte.tracking, u.id)
      from cte join
           tableuser u
           on u.email = cte.email or
              u.mobileno = cte.mobileno or
              u.deviceid = cte.deviceid or 
              u.ipaddress = cte.ipaddress
      where find_in_set(u.id, cte.tracking) = 0
     )
select *
from cte;

OUTPUT:

Hmmm can I do that using Python ? I am thinking about pandas

import numpy as np
import pandas as pd
import functools
df = pd.DataFrame({'userId':
                       [1, 2, 3, 4,],
                   'phone':
                       ['01111', '01111', '53266', '7455'],
                   'email':
                       ['aziz@gmail', 'aziz1@gmail', 'aziz1@gmail', 'aziz2@gmail'],
                   'deviceId':
                       ['Ab123', 'Ab1234', 'Ab12345', 'Ab12345'],
                   'isFraud':
                   [1,0,0,0]})

because it shares attributes with fraudsters user @Roy2012 . User 1 is a fraudster. User 2 is sharing the same phone with User 1 (User 2 becomes a fraudster). User 3 is sharing the attribute with User 2 (email). User 4 is sharing the same deviceId with user 3. — ABDULAZIZ NOREDIN QADMOR, Jun 29 '20 at 07:49
IMHO when looking at `df` data all users are fraudsters, but comments may refer also to provided screenshot where data is diffrent and user identified by 'F' is not a fraudster. — ipj, Jun 29 '20 at 08:07
@r-beginners okay imagine this scenario: User A tried to make something on my website. I have set him as fraudsters. Then he creates a new account (userid = 2). however, he used the same phone. In this case, the new account is fraudsters as user A — ABDULAZIZ NOREDIN QADMOR, Jun 29 '20 at 08:11
@ipj only userid = 1 is a fraudster. the other are shared attributes. i am looking for something that returns to me they are shared attributes. so i will delete them or block the accounts. — ABDULAZIZ NOREDIN QADMOR, Jun 29 '20 at 08:17

Roy2012 · Accepted Answer · 2020-06-29T08:27:02.133

2

Here's a solution. It basically calculates the transitive closure of the fraudster users:

df = pd.DataFrame({'userId':
                       [1, 2, 3, 4,],
                   'phone':
                       ['01111', '01111', '53266', '7455'],
                   'email':
                       ['aziz@gmail', 'aziz1@gmail', 'aziz1@gmail', 'aziz2@gmail'],
                   'deviceId':
                       ['Ab123', 'Ab1234', 'Ab12345', 'Ab12345'],
                   'isFraud':
                   [1,0,0,0]})


def expand_fraud(no_fraud, fraud, col_name):
    t = pd.merge(no_fraud, fraud, on = col_name)
    if len(t):
        print(f"Found Match on {col_name}")
        df.loc[df.userId.isin(t.userId_x), "isFraud"] = 1
        return True
    return False

while True:
    added_fraud = False
    fraud = df[df.isFraud == 1]
    no_fraud = df[df.isFraud == 0]
    added_fraud |= expand_fraud(no_fraud, fraud, "deviceId")
    added_fraud |= expand_fraud(no_fraud, fraud, "email")
    added_fraud |= expand_fraud(no_fraud, fraud, "phone")   
    if not added_fraud:
        break

print(df)

The output is:

   userId  phone        email deviceId  isFraud
0       1  01111   aziz@gmail    Ab123        1
1       2  01111  aziz1@gmail   Ab1234        1
2       3  53266  aziz1@gmail  Ab12345        1
3       4   7455  aziz2@gmail  Ab12345        1

edited Jun 29 '20 at 08:27

answered Jun 29 '20 at 08:15

Roy2012

11,755
2
22
35

ABDULAZIZ - Let me know if this answers your question. – Roy2012 Jun 29 '20 at 08:21
I am getting something else after running ur code: PycharmProjects/Mysql/test.py phone email deviceId Process finished with exit code 0 – ABDULAZIZ NOREDIN QADMOR Jun 29 '20 at 08:22
Look at the value of `df`. That's the output. – Roy2012 Jun 29 '20 at 08:22
Thank you so much. I voted your answer. but it is 80% completed :(. There is something else I hope you can help me with? – ABDULAZIZ NOREDIN QADMOR Jun 29 '20 at 08:27
If it answers your question, it would be great if you could accept it by clicking the checkmark next to the answer and turning it to green. As to helping you with something else - sure. If it's a longer question, it might be best to ask a new question. I promise to have a look. – Roy2012 Jun 29 '20 at 08:28
Done thanks again, never mind I will give it a try if I was not able to do it. I will ask a question later :). – ABDULAZIZ NOREDIN QADMOR Jun 29 '20 at 08:32

Recursive CTE / transitive closure in pandas

1 Answers1