1

I am about to index tweets coming from Apache NiFi to Elasticsearch as POST and want to do the following:

  1. Make create_at field as date. Should I use mapping or index template for this?

  2. make some fields not analyzed. Like hashtags, URLs, etc.

  3. Want to store not entire tweet but some important fields. Like text, not all user information but some field, hashtags, URLs from entities (in post URLs). Don't need quoted source. Etc. What should I use in this case? template? Pre-process tweets with some ETL process in order to extract data I need and index in ES?

I am a bit confused. Will really appreciate advise.

Thanks in advance.

Igor K.
  • 915
  • 2
  • 12
  • 22
  • In point 2, I'm not sure what you mean by "canalized"? Did you mean analyzed? – Val Dec 06 '15 at 04:40
  • There are several folks in the NiFi community that have an interest in integration with Elastic search. There has been talk of a bulk importer to get data from NiFi to ES and a query mechanism to get data from ES to NiFi. If you're interested in collaborating or have any questions let us know dev@nifi.apache.org. Thanks – Joe Witt Dec 06 '15 at 17:56
  • Hi Joe, In #2 I want some fields not analyzed. It would be nice to have ES processor. Tank you for the email. – Igor K. Dec 06 '15 at 20:42

1 Answers1

1

I guess in your NiFi you have something like GetTwitter and PostHTTP configured. NiFi is already some sort of ETL, so you probably don't need another one. However, since you don't want to index the whole JSOn coming out of Twitter, you clearly need another NiFi process inbetween to select what you want and transform the raw JSON into another more lightweight one. Here is an example on how to do it for Solr, but I'm not sure the same processor exists for Elasticsearch.

This article about streaming Twitter data to Elasticsearch using Logstash shows a possible index template that you could use in order to mold your own (i.e. add the create_at data field if you like).

The way to go for you since you don't want to index everything, is clearly to come up with your own mapping, which you can then use in an index template. Using index templates, you will be able to create daily/weekly/monthly twitter indices as you see fit.

Val
  • 207,596
  • 13
  • 358
  • 360