0

The documents I have stored in elasticsearch have been given a common id (cid) if they relate to the same event.

Is there a way within kibana to treat these multiple documents as a single one?

For example I want to find the cardinality of a field. Each set of documents with the same 'cid' should only count once.

{
   "f": "foo",
   "cid": 1,
   ...
}

{
   "f": "foo",
   "cid": 1,
   ...
}

{
   "f": "foo",
   "cid": 2,
   ...
}

This should give the cardinality of the term foo to be 2.

When I try to create a visualization using the cid as a bucket, then the field I want to visualize as a sub-bucket, I just get visualizations on the cids, then within that the other field.

I am not sure if kibana is appropriate for this, or I would be better passing the index back through a script to merge these documents into one (seems a bit messy).

Any ideas appreciated.

Brett

Community
  • 1
  • 1
Brett
  • 113
  • 3
  • 9

2 Answers2

0

Keep in mind that Elasticsearch (ES) assigns each document it's own _id, so even if you treat cid as unique identifier of a document, ES has no idea about it and will index 3 different documents for the example you gave in your question. You can change the way how ES generates an index for a document and make it use the value of cid field. Had ES been using cid value as an index, you would have had only 2 documents indexed. See this question to figure out how to have ES use your cid field as an identifier.

Another option for you is to have Kibana counting unique values of the cid field (this will be your metric) when you split into buckets by f terms. If you play with Kibana UI, you should be able to achieve the same task

Community
  • 1
  • 1
oldbam
  • 2,397
  • 1
  • 16
  • 24
  • Thanks - I think yourself and Alain have solved the above example for me. Just trying to think where this doesn't work with more complex queries (trying to figure out whether this is really the best way to do it, or if merging them is more sensible). Otherwise I will marked your answer as accepted. – Brett Feb 15 '16 at 09:34
  • To add to the above - is it safe to have multiple documents use the same _id or _uid? I can't find anything one way or another regarding them having to be unique. It doesn't "feel" like a good thing to do. – Brett Feb 15 '16 at 09:47
  • I don't think you will end up having two different documents with the same ID. Last write will serve as an update to an already existing document. I would test the above assumption with your ES distribution though :) – oldbam Feb 16 '16 at 10:28
0

oldbam's answer sort of lead me down the correct path, but the vagueness of my question, didn't help give precision.

In the end the answer I used was to upsert via logstash instead of insert whilst also using my cid as the _id.

So in the elasticsearch output you have to do :

doc_as_upsert => true # Make sure we use the doc as the values to upsert
action => "update" # update if possible instead of overwriting 
document_id => "%{cid}" set the _id to cid

Hope that helps anyone else solving this.

Brett
  • 113
  • 3
  • 9