0

We have a MySQL database, which has three tables:

  Objects                Composite                  Tags   
+-----+-----+          +-----+-----+            +-----+-----+
| ID  | Name|          |ObjID|TagID|            | ID  | Name|
+-----+-----+          +-----+-----+            +-----+-----+

Objects and tags tables are indexed and we're using Sphinx Search engine (can be changed if need be).

When an object search is ran, we get the object IDs from the search engine and then check the composite table via MySQL query to find what the objects' tags are.

The problem is, when we try to search within tags and get several tag results, we need to first get ALL the objects that are tied to those tags (currently, MySQL is doing this part) and then go back to find what their tags are (again, MySQL is doing this part). Since we're going to have more than a few million objects, this will mean death for our tiny little server if enough requests are ran and this part really needs some optimization.

What we're looking for is either a search engine that can index composite tables efficiently or do searches joining these tables together. Do any of you know about a search engine that can handle these tasks (so far nothing we've seen can) or should we take a different approach to this problem altogether?

John Hamilton
  • 125
  • 1
  • 9
  • These answers should help: http://stackoverflow.com/questions/34477816/how-to-use-elasticsearch-to-get-join-functionality-as-in-sql/34477920#34477920 + http://stackoverflow.com/questions/36915428/how-to-setup-elasticsearch-index-structure-with-multiple-entity-bindings/36982705#36982705 – Val Jan 15 '17 at 17:16
  • @Val If there's really no way, that's too bad. I think I'll try denormalizing if I can find no other solution. Thanks for the posts, I'll refer to them in the future too. – John Hamilton Jan 15 '17 at 17:58

1 Answers1

1

Tags like this can be easily indexed in Sphinx. In effect you denormalize when building the index. Use a JOIN in the sql_query...

sql_query = SELECT Objects.ID, Objects.Name AS name, 
        GROUP_CONCAT(Tags.Name) AS tags
        FROM Objects
        LEFT JOIN Composite ON (Objects.ID = ObjID) 
        LEFT JOIN Tags ON (Tags.ID = TagID)
        GROUP BY Objects.ID ORDER BY NULL

... this gives you a field called tags you can search tags with. (ie keyword searches)

You should like you also want to run 'GROUP BY' queries to get facet information. So add the TagIDs as a Multi-Value-Attribute, so can query, and/or group by tags.

Add , GROUP_CONCAT(TagID) as tag_ids to sql_query. Then add sql_attr_multi = uint tag_ids from field;

To turn it into MVA. http://sphinxsearch.com/docs/current.html#mva (sphinx works better with lower-case fields/attributes names)

You can then run a sphinxQL query

SELECT GROUPBY() AS tag_id, COUNT(*) AS count FROM index 
   WHERE MATCH('keyword search') GROUP BY tag_id ORDER BY count DESC;

Which gives you all the tags (and their counts) for Objects matching the keyword search. (all the heavy lifting of collating all the tags is internal to sphinx)

If do say so, grouping by MVA is one of the more magical features of Sphinx :)

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • Thank you. I was thinking this would be the closest solution. It still uses up precious hard drive space (we're not even a startup just a couple students doing stuff on student budgets) but still, if there are no other solutions in a couple of days, I'll pick this as the answer. – John Hamilton Jan 16 '17 at 07:07
  • Its probably not a bad as you might think. Sphinx in effect normalizing when indexing anyway. The full-text fields are stored in an inverted index. And the attributes are pre-grouped by ID anyway, which means storage will be less than size of your 'Composite' table, in database. – barryhunter Jan 16 '17 at 10:49
  • We've joined the tables in Sphinx and it seems it's working perfectly fine and the sizes aren't all that big. Thanks for the answer it was helpful. (Although it can use a bit of editing but nothing too serious :D) – John Hamilton Jan 17 '17 at 13:31