0

I have multiple entities and properties of the following kind:

  • Linkedin company
    • name
    • phone
  • Facebook
    • facebook_url
    • name
    • website_url
    • phone
  • website
    • linkedin_url
    • facebook_url
    • phone

Not all entities have all their properties filled.

I want to create a unified dataset that will be based on matched corresponding values between all the entities

I'm considering using a graphdb, neo4j in particular

But if each entity is a node, then I will have to create each relationship by programaticcaly checking the equality of each property to the corresponding property in all other entities.

I also consider using some kind of an sql join, but then it seems like maintaining it when the data model widens will be hard.

What is the write approach to solve this problem?

Which technology is best for this?

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
Lolol
  • 11
  • 1

1 Answers1

0

Here is one approach for doing that in neo4j. (Stackoverflow is not the right place to ask about the "best" technology for doing something, as that tends to be very subjective.)

You can create unique URL, Phone, Person, and Account nodes, and have each Account connected to the appropriate URL, Phone, and Person nodes.

For example, assuming your 3 sample accounts are related to the same person, here is how you could represent that in the DB:

MERGE (pe:Person {name: 'Jane Doe'})
MERGE (ac_li:Account {type: 'li', id: 'xyz'})
MERGE (ac_fb:Account {type: 'fb', id: 'abc'})
MERGE (ac_si:Account {type: 'site', id: 'foo'})
MERGE (url_li:URL {type: 'li', url: 'http://example.net/xyz'})
MERGE (url_fb:URL {type: 'fb', url: 'http://example.com/abc'})
MERGE (url_si:URL {type: 'site', url: 'http://example.org/foo'})
MERGE (ph:Phone {number: '1234567890'})

MERGE (pe)-[:ACCOUNT]->(ac_li)
MERGE (pe)-[:ACCOUNT]->(ac_fb)
MERGE (pe)-[:ACCOUNT]->(ac_si)
MERGE (ac_li)-[:PHONE]->(ph)
MERGE (ac_fb)-[:URL]->(url_fb)
MERGE (ac_fb)-[:URL]->(url_si)
MERGE (ac_fb)-[:PHONE]->(ph)
MERGE (ac_si)-[:URL]->(url_li)
MERGE (ac_si)-[:URL]->(url_fb)

Then, if you wanted to find all the Account and Person combinations related to a specific URL, you could do this:

MATCH (url:URL)<-[:URL]-(account)<-[:ACCOUNT]-(person)
WHERE url.url = 'http://example.com/abc'
RETURN account, person
cybersam
  • 63,203
  • 6
  • 53
  • 76
  • thanks @cybersam for the answer. You went with the approach of extracting each "property" of the entity to its own node. You are giving the relationship the same label as the node. Why is that? Is this a pattern of neo4j, or does neo4j ignores same labels when they are on nodes and relationships? – Lolol Sep 05 '20 at 19:43
  • The choice of relationship type just seemed to feel right to me. You can use whatever names make sense to you. – cybersam Sep 06 '20 at 19:19