1

I have a process that is writing a JSON data object from Kafka and putting some of the fields from this object via the elastic API into elastic search index.

I have to write two separate messages - one for the data object and another for the Elastic search index which is a subset of the data object.

My question is this: Can I augment the JSON metamodel so that I publish only one record format from Kafka, which contains both the full data object and the indexable fields, but only the indexable fields are loaded into Elastic search? Then I don't have to maintain two separate processes and keep them both synced up, I just have one process and JSON record.

I am not batch loading, so I cannot use the Bulk load API and 'index' field marker that this tool uses like this JSON bulk load API example

  • How about leveraging Logstash to consume your Kafka topic and only send part of the fields to ES? There's no big value in writing an application yourself to do it. – Val Jul 21 '17 at 13:34
  • Trying to find a more elegant repeatable solution initially that ideally augments the JSON metamodel itself for future reusability. Is it possible? – user2900958 Jul 21 '17 at 13:42
  • Not sure about your specific use case(s), but you have a huge freedom in Logstash filters to massage and augment your data as you see fit. It's just that taking data from A to send it to B should be left to some existing tool that's fit for the job and not to yet another home-grown application that you have to maintain over time. – Val Jul 21 '17 at 13:47
  • The JSON scheme in use is very complex so we're bound by ES to create the schema in ES first. Means we can't use logstash because we are forced to create the schema first in ES. – user2900958 Jul 24 '17 at 08:03
  • Using Logstash doesn't prevent you to do that. It's still possible using an index template. – Val Jul 24 '17 at 08:13
  • Yes but it's not automated , how would one go about automating that in a repeatable robust fashion? Is there a way to augment the JSON metamodel for this? The problem is if my schema is changing because it's semi structured and flexible, how do I manipulate the JSON so as to create the templates ? Is there a way to do this from within JSON ? – user2900958 Jul 24 '17 at 09:56
  • The index template is something you create in ES **before** creating the index itself, i.e. storing the first document. If your JSON documents have new fields afterwards the mapping will be changed by ES on the fly. – Val Jul 24 '17 at 13:52

0 Answers0