0

I'm looking for a way to find links between users to solve a particular fraud scenario.

I've got some high-level linking rules as follows:

  • Surname + DOB
  • Mobile + Postcode
  • Email + Postcode
  • Mobile + Surname
  • Mobile + DOB
  • etc

If we can link a user to a known fraudster, we'd take some kind of action.

I'm thinking that a graph database (thinking of using Cosmos DB Graph API) might be a good approach.

Questions:

  • Based on the description I've provided, would a Graph DB make sense, or would a relational DB be better for this scenario? I have experience with relational DBs, but colleague mentioned that a graph DB might be better for this scenario.
  • Assuming a graph DB is the way to go, I'm struggling a bit with the modelling - like I said, I have done a lot of work with relational DBs and this is probably impacting how I think about this. Here's an example - a Person will have a Mobile, Email, & DOB, but these things may have been used by someone else (especially when considering fraud scenarios) and if we can find a link we'll need to take some action. In a graph DB, would it make sense to have Mobile, Email, & DOB as separate vertices, or would they be properties of a Person vertex... I feel like they should all be separate vertices, so that I can define edges/links to them from multiple Persons. The thing that seems like it might be incorrect is that I'll have very fine grained vertices that only have properties for id & label e.g. label = email, id = test@gmail.com. Thoughts?

Cheers.

Ryan.Bartsch
  • 3,698
  • 1
  • 26
  • 52

1 Answers1

1

I'm not sure why you would create separate vertices for things like a surname, DOB, and email address, unless the graph database you chose couldn't handle objects with fields.

InfiniteGraph is a object-oriented graph database where you can define simple objects with attributes or even highly complex objects that embed other objects into a single vertex object.

The problem I see with fine-grained vertices is the possibility of super-nodes. How many people have the same surname or DOB. If the answer is millions then you will have millions of edges in to and out of that vertex. This will likely make query processing inefficient.

In InfiniteGraph I might build a schema that looks something like the following:

UPDATE SCHEMA {

    CREATE CLASS Person
    {
        surname : String,
        dob     : Date,
        
        mobiles : LIST {
                        element: Reference {
                            referenced: MobilePhone,
                            inverse:    user    
                        }, 
                        CollectionTypeName: SegmentedArray
                    }
        emails  : LIST {
                        element: Reference {
                            referenced: Email,
                            inverse:    user    
                        }, 
                        CollectionTypeName: SegmentedArray
                    }
    }   
    
    CREATE CLASS MobilePhone 
    {
        phoneNumber : String,
        
        user        : Reference {referenced: Person, inverse: mobiles }
    }
    
    CREATE CLASS Email 
    {
        emailAddress : String,
        
        user        : Reference {referenced: Person, inverse: emails }
    }
}

This would then allow you to query for all of the Email addresses being used by people with a particular DOB, and get all of the phone numbers being used by those same people:

MATCH p = (ev:Email {address = "abc@xyz"})
        -->(pv:Person {dob == 1999/08/2})
        -->(mv:MobilePhone) 
        return ev, pv, mv;

And you could reverse the query and start with the mobile phone number to get the people associated with the number and then email address associated with those people:

MATCH p = (mv:MobilePhone {phoneNumber = "123-456-9987"}) 
        -->(pv:Person {dob == 1999/08/2})
        -->(ev:Email)       
        return mv, pv, ev;

Disclaimer: I am the Director of Field Operations for Objectivity, Inc., maker of InfiniteGraph.

djhallx
  • 690
  • 6
  • 17
  • Thanks for the response djhallx. WRT creating separate vertices for things like mobile, email, DOB as opposed to just having these as properties of a person vertex; would you agree that 'it depends'... things like DOB & Postcode would likely end up with "super-nodes" and should probably be properties, whereas things like email & mobile would only connect Person vertices in fraud syndicate scenarios and probably wouldn't end up in "super-nodes". cont... – Ryan.Bartsch Aug 19 '21 at 23:45
  • ... It would also be really nice to be able to visualize the graph and the edges connecting fraud syndicates, so I think separate vertices for email & mobile, and properties for DOB, Postcode, etc. Thoughts... – Ryan.Bartsch Aug 19 '21 at 23:45
  • ... would you agree that 'it depends'? It always 'depends' and you are correct that the more unique an item is, the less likely it is to become a supernode. – djhallx Aug 21 '21 at 11:41
  • I worked on a project for a client several years ago where they differentiated between 'Person' and 'Persona'. A Person was an actual, physical human being, whereas a Persona was a collection of attributes that represented a person. A Person might take on different Personas and their problem, much like yours is to find the Person using all of the Persona data. – djhallx Aug 21 '21 at 11:45