0

One of the problems that I'm trying to solve with Master Data Management (MDM) is merge duplicate entities that look different because of things like misspellings. For instance John Doe and Jon Doe might in reality be the same people.

I've read that graph databases like Neo4J can be used for MDM, and I have the vague sense that graph theory might be able to help me resolve the problem of duplicate entities. Basically if I look at the relationships between John Doe/Jon Doe might graph similarity of that node with other pieces of data offer a way to decide whether they are in fact the same object?

If so, how can I go about doing this with Neo4J?

Eric Yang
  • 2,678
  • 1
  • 12
  • 18
  • This site is mostly good for answering questions about something you've tried; this is a reasonable question but VERY broad, asking for strategies for implementing MDM in neo4j, without necessarily knowing what scope of MDM activities you want to cover, other than just de-duplication. I'd recommend you read in general on graph pattern matching, then try something, and come back with a more specific question. The published use case about finding bank fraud should get your creative juices flowing on how graph DBs can find patterns. – FrobberOfBits Jan 07 '15 at 21:58
  • http://linkurio.us/how-to-detect-bank-loan-fraud-with-graphs-part-1/ – FrobberOfBits Jan 07 '15 at 21:58
  • The hint I can offer to help you along is that you might be able to demonstrate that "John Doe" and "Jon Doe" are the same people if they have connections to the same other things in the graph. What commonalities they should have in order to qualify as a match will depend on your domain and requirements, but neo4j does give you the tools to do sophisticated comparisons of the two. – FrobberOfBits Jan 07 '15 at 22:06
  • Sorry, the top part was more of a musing, the second part is primarily how to do graph matching or subgraph matching with Neo4j – Eric Yang Jan 07 '15 at 23:04
  • Subgraph matching: http://stackoverflow.com/questions/16101789/extract-subgraph-in-neo4j – FrobberOfBits Jan 08 '15 at 13:16
  • Warning, watch out - in graphs, certain kinds of subgraph matching get ridiculously algorithmically complex. See also: http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem – FrobberOfBits Jan 08 '15 at 13:17

0 Answers0