0

I'm working on replication data from MySQL to elasticsearch so I'm using logstash to do replication but the problem is that in mysql there is a product table that has many relations and select query contain 46 left outer join so result returned is very huge it returns around 50000 (50k) rows for only single record of table product and that makes replication performance is slow, so I need to ask if there some other solution for doing the replication and how I can to solve performance issue with huge result

sample of product with its relations (not all ) :

product with its relations

and that's elasticsearch mapping :

elasticsearch mapping

Chris
  • 4,672
  • 13
  • 52
  • 93

1 Answers1

1

You need to leverage the json datatype from mysql. The paradigm here is to crunch all of this beautiful data ahead of the actual indexation process. You run a stored procedure that handles the data and store it in one big flat table. All your relations are stored as json objects in a json field (when it makes sense of course). Then you index those relations as nested datatypes in elasticsearch. you can then aggregate on it to obtain the desire results. Works like a charm!

For example:

INSERT prod.id,
...,
(SELECT JSON_ARRAYAGG(JSON_OBJECT('color', color, 'size', size)) FROM tag_products where prod.id = tprod.product_id) AS tag_product_as_json,
INTO Elastic_Products FROM Products AS prod
LEFT JOIN tag_product AS tprod ON prod.id = tprod.product_id
...

I'm more of a SQL server guy so forgive me if you find any error in the syntax. The tag_product_as_json would be of a json datatype and would contain all rows that have the same product_id. You could store a lot of rows in there. This example is very basic. you could achieve similar results in a more memory efficient manner using common table expressions but it's more complicated and goes over the scope of the topic.

huglap
  • 31
  • 4
  • thanks for your help but can please give me an example or scenario – Mohammad Amjad Jun 16 '19 at 14:38
  • I have edited my answer. Keep in mind I've never actually done it in MySQL but done it countless of time in SQL server. I'm pretty sure the example above would work. You basically encode the result of a query and pluck it in a json field. Hope it helps. – huglap Jun 16 '19 at 17:59