0

We all know that at Facebook the graph search exists. Users can search for people who like cycling and are from London, for example, friends of friends who like yoga, or photos of friends or boyfriends from a certain month or year.

All this data is extracted from a single search input with no filter fields.

I am trying to start with something similar with PHP but I couldn't tell exactly how this might be implemented.

I was wondering if this is applied through a certain database design approach (simple RDBMS) only... or is it a sort of graph node structures that get logically linked to database tables with keywords... or a mixture of RDBMS and NOSQL... or any other approach. As for the text input itself, there must be some sort of dissection and matching against specific keywords to get the relevance of data and directing it to the proper query execution.

What is the best practice to achieve a php graph search (or something similar at least) within my website where I have something similar to a retail e-commerce system with grouped relevant data?

Peter Dixon-Moses
  • 3,169
  • 14
  • 18
KAD
  • 10,972
  • 4
  • 31
  • 73
  • Your question is a bit too broad for SO, because you don't really have a precise problem statement (like which code isn't working). However, what you're looking for is called a live search—and there are plenty of JS libraries out there, as well as tutorials, that can help you with implementing live searches based on a text input. There is a lot things needed to be considered for a live search, like the database design and how tables can be queried, and etc. – Terry Oct 14 '16 at 18:49
  • If you are interested in Graph database, you may want to stat with Neo4j with, I believe, has some natural language processing baked in. Agreed though with @Terry. This is not appropriate for SO. – JNevill Oct 14 '16 at 18:50
  • Alright guys thank you for the info :) – KAD Oct 17 '16 at 05:12

1 Answers1

0

You could solve for each of your examples separately, but it could prove tedious, and you'd likely run into a wall in terms of performance.

People who like cycling and are from London (SQL)

   SELECT users.id 
     FROM users, posts, topics, locations 
    WHERE posts.topic_id = topics.id
      AND users.id = posts.author_id
      AND users.location_id = locations.id
      AND locations.city = 'London' 
      AND topics.name = 'cycling'    
 GROUP BY users.id   
 ORDER BY COUNT(posts.id) DESC

(using a really loose definition of 'liking cycling', and being 'from London')

Relational Databases don't handle lots of joins particularly gracefully. Your performance is going to suffer under load or with a large dataset.


However, in a Graph Database (like Neo4J, or TitanDB), you could traverse a graph of related entities and collect matching entity nodes in a much more generic way, in an environment optimized for serving the type of use cases you're thinking about.

Same query (Cypher - Neo4J)

   MATCH (topic:Topics {name:'cycling'})
           <-[:POST_TOPIC]-(post:Posts)
           -[:AUTHORED_BY]->(user:Users)
   WHERE user-[:RESIDENT_OF]->(location:Location {city:'London'})
  RETURN user.id AS user_id, count(post) AS post_count
ORDER BY post_count DESC

These are also expressible as Gremlin traversals (for Titan and other Graph DBs), but they start getting quite verbose and hard to decipher.

There are generic ways to approach what you describe with facebook-style graph search relevance. In your case, it sounds like you probably want personalized search, e.g. all the related vertices within a few degrees of separation of the searcher (using whatever edge relationships you have: Location, Interests, Friends, etc...).


If you can't easily enumerate all the use cases you want to build today, you'll probably be happier with a graph database, so you can experiment with your ideas, and launch them into production without having to cut corners for performance reasons.

Peter Dixon-Moses
  • 3,169
  • 14
  • 18
  • Sounds good @Peter thank you. How do you think the keywording is defined, in other words, how can I know that cycling should be queried against topics and london against location. Is there a sort of algortihm to relate keywords to entities within the database or is it just a matter of trial-and-error? – KAD Oct 17 '16 at 05:19
  • You could try to do something generic by graph distance (between searcher and node with matching keyword). Ultimately though, you may want to customize your logic around specific entity types. – Peter Dixon-Moses Oct 17 '16 at 12:56
  • can you please explain your comment more so that the idea is more clear, maybe with a small example or so? – KAD Oct 17 '16 at 13:13