Imagine I have an ElasticSearch instance with three kinds of data - author, publisher and book - all in JSON. Author data looks like this:
{
"document-id": "XYZ",
"document-type": "author",
"name": "John Doe",
"country": "Canada"
}
, publisher data looks like this:
{
"document-id": "JKL",
"document-type": "publisher",
"name": "Random House"
}
, and book data looks like this:
{
"document-id": "ABC",
"document-type": "book",
"authorId": "XYZ",
"publisherId": "JKL",
"title": "Logstash for Dummies"
}
As of now, each goes into its own index.
I would like to create a denormalized version of the data, so that I can easily search for all books written by Canadian authors, or published by Random House. I need to support updates to the author, publisher and book data, so that if the author moves to a new country or changes their name, the denormalized copy will also be updated.
I also need to keep all fields from all objects in the denormalized copy (i.e., avoid collisions between the two document-id fields, so that both document-id values are present, even if one has to be renamed; same goes for publisher.name and author.name). And all this will be used in Kibana reports which, as I understand it, doesn't have great support for nested objects, though it does seem to have some support, which might eliminate my field-name-collision concerns.
What's the best way to achieve this? I've seen discussions that lead me towards the Logstash aggregate filter, or the ElasticSearch output plugin, and I'm unsure what to pursue. Is Logstash even necessary, or is this possible with ingest pipelines?
Do both document types need to be in the same index in order for this to work? And should book be "enriched" with author and publisher data, or should they all be combined into yet a fourth document type?
I'm an ElasticSearch novice, and a complete newcomer to Logstash, so I'd appreciate any guidance you can provide.
Thanks!
(Cross-posted from https://discuss.elastic.co/t/enrich-one-document-with-fields-from-another/208651, after not receiving a response there after five days.)