2

I've worked with Nutch 1x for crawling websites and using Elasticsearch to index the data. I've come across Storm-crawler recently and like it, especially the streaming nature of it.

Do I have to init and create the mappings for my ES server that Storm-crawler is sending the data to?

With Nutch, as long as I had the ES index up and running, the mapping took care of itself... except for some fine tuning. Is it the same for Stormcrawler? Or do I have to init the index and mapping before?

Julien Nioche
  • 4,772
  • 1
  • 22
  • 28
user3125823
  • 1,846
  • 2
  • 18
  • 46

1 Answers1

1

Great to hear you like StormCrawler.

As explained in README and the video tutorial based on ES2.x, you should use the ES_IndexInit script to set the mapping explicitly. It probably works without it but it would not be optimal.

Julien Nioche
  • 4,772
  • 1
  • 22
  • 28
  • I'm currently running SC 1.5 with ES 5x. So then can I add more fields to that mapping in the script (for the index index)? – user3125823 Jun 01 '17 at 16:19
  • Instead of using bash script thats included, can I just use Console in Kibana to create the index and mappings? – user3125823 Jun 01 '17 at 18:50
  • you can create the indices and mappings in any way you like, including Kibana. And yes, the mappings are customisable and you can tune it to your needs. – Julien Nioche Jun 02 '17 at 07:17