1

We created an index in Elasticsearch as follows, index name is apachelog with dynamic mapping set to "strict", we set the httpresponse field to type integer:

curl -X PUT 'http://localhost:9200/**apachelog**' -d \
'{
  "log": {
<b>"dynamic": "strict"</b>,
    "properties": {
      "@fields": {
        "properties": {
          "agent": {"type": "string"},
          "city": {"type": "string"},
          "client_ip": {"type": "string"},
          "hitTime": {"type": "string"},
          "host": {"type": "string"},
          <b>"httpresponse": {"type": "integer"}</b>
        }
      },
      "@message": {"type": "string"},
      "@source_host": {"type": "string"},
      "@timestamp": {"type": "date", "format": "dateOptionalTime"}
    }
  }
}'

Our flume ElasticSearch sink is configured as below, notice the index name is apachelog same as the index already created in ES:

Write to ElasticSearch

collector.sinks.elasticsearch.type =          org.apache.flume.sink.elasticsearch.ElasticSearchSink
collector.sinks.elasticsearch.channel = mc2
collector.sinks.elasticsearch.batchSize=100
collector.sinks.elasticsearch.hostNames = localhost:9300
collector.sinks.elasticsearch.indexName = apachelog
collector.sinks.elasticsearch.clusterName = logsearch
collector.sinks.elasticsearch.serializer =    org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

Now, when we start and run the flume agent, we notice that a new index is created in ElasticSearch with name apachelog-2015-09-09 and the data type for field httpresponse is string. We notice that Flume/ES is adding documents to the newly created index and the index that we created explicitly with name apachelog is dormant.

Any idea why this is happening and how we can get Flume/ES to use our index as opposed to creating its own?

1 Answers1

0

The Flume Elasticsearch Sink is behaving as designed for its default behaviour. It's writing to an index name based on the the UTC timestamp. This is what the Kibana tool will go looking for.

You can change the index name format to, for example, write to monthly indexes instead of daily. I don't remember if we left in a way to not have a time component.

I would suggest that you probably don't want to have a single index for all of your events. ES will start having problems when it gets massive and you won't be able to do anything to change the replication or sharding factors. This was the experience GrayLog had when they had a single index.

As you appear to be wanting to configure the field types I suggest you look into index templates and add one for apachelog* that does what you want. Then delete the old indexes and let ES create an index following that template.

Sarge
  • 2,367
  • 2
  • 23
  • 36