I have two file crawler jobs running separately on data which are related to each other using fscrawler(https://github.com/dadoonet/fscrawler). Now I want to in some way merge the data together when indexing(child-parent relation or flat document is OK), so some middleware is needed. Looking at both Logstash and the new Ingest Node feature in ES 5.0, none seem to support writing custom processors.
Are there any possibilities to do this sort of merging/relational mapping at index time? Or do I have to do post-processing instead?
EDIT: One job crawls "articles" in json-format. Articles can have multiple attachments (declared in an attachment array in the json), in a different location. The second job crawls the actual attachments(e.g pdf...), applying TIKA processing on it. In the end I would like to have one article type, which also contains the content of the attachments.