0

We have an index in ElasticSearch that receives logs from both FluentD and Jaeger. The date-time column gets messed up, because apparently the two apps use a different format, FluentD uses ISO8601 whereas Jaeger uses Epoch-Millis. As a consequence, we have no logging in Kibana.

In the Helm values file that was used by my colleagues to install the EFK stack, there is a stanza for FluentD, but nothing for Jaeger, which makes sense as the creator of this chart only had FluentD in mind.

We use a dynamic mapping when the index gets created at midnight every 24 hours, and if the first log entry happens to be from FluentD, all is fine. But if the first entry is from Jaeger, we get no logging at all.

My questions are:

  1. Is it supported to have an index with two different sources?
  2. If yes, how can we ensure that ES receives and parses the two date-time formats properly?

Thanks for any clues or pointers.

GID
  • 43
  • 1
  • 10

1 Answers1

0

Q.1:

Is it supported to have an index with two different sources?

Absolutely, you can send to the same index index_1 data coming from source_a, and source_b.

Whatever source you are using just configure the source to send the data to the target index.

I would also recommend to use the Elastic Common Schema a.k.a ECS. And map the fields from all your sources to the fields in the ECS

Q.2:

How can we ensure that ES receives and parses the two date-time formats properly?

They are many ways to achieve this:

  • Ingest Pipeline
  • Beats
  • Logstash

As you are not mentioning Beats nor Logstash. I expect you will find it more convenient to use ingest pipeline.

  1. Create an ingest pipeline [doc]
  2. Test the ingest pipeline [doc]
  3. Use the ingest pipeline on indexing [doc]

Toy project

It is quite easy to set up actually,

# Create a pipeline
PUT /_ingest/pipeline/71189349
{
  "processors": [
    {
      "date": {
        "field": "datems",
        "formats": ["UNIX"]
      }
    }
  ]
}

# Test the pipeline with some data of yours
POST /_ingest/pipeline/71189349/_simulate
{
  "docs": [{
    "_source":{
      "datems": "1645367903"
    }
  }]
}

For my sample I get this result. Update your pipeline, until you get the proper parsing

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "@timestamp" : "2022-02-20T14:38:23.000Z",
          "datems" : "1645367903"
        },
        "_ingest" : {
          "timestamp" : "2022-02-20T15:07:16.361424083Z"
        }
      }
    }
  ]
}

Once you are satisfied, just use this pipeline at query time.

POST <Your index>/_doc?pipeline=71189349
{
   ...
   <Your data>
   ...
}
Paulo
  • 8,690
  • 5
  • 20
  • 34
  • Many thanks for taking the time to respond in a detailed way. What you are suggesting seems to require several days of read-up, developing and testing. I was hoping for a quicker fix, such as enabling a different, more flexible mapping on the ElasticSearch side, that would correctly parse the two disparate date format from the two sources. – GID Feb 20 '22 at 13:42
  • @GID, look at the update, it is actually very simple to set up. – Paulo Feb 20 '22 at 15:12
  • Thanks again. In your example you are defining a single format. How would you define the two different formats that I need for the date field? – GID Feb 20 '22 at 19:58
  • Well you only need one, as the other is already in the `ISO8601` format, but you can just either create another pipeline or use this one and add "ISO8601" to the formats array. It is all written in the documentation tho, I think it is worth a read. – Paulo Feb 20 '22 at 20:05