1

I'm using Kairosdb as my primary db. Now I want to integrate the Elasticsearch functionalities to my data inside Kairosdb. As stated inside the docs I have to duplicate all entries of my primary db inside Elasticsearch database.

Update

What I mean is that, if I want to index something inside elasticsearch, I have to do, for example:

Retrieve data of Kairosdb, a example json {"name": "hi","value": "6","tags"}

and then put it inside Elasticsearch:

 curl -XPUT 'http://localhost:9200/firstIndex/test/1' -d '{"name": "hi","value": "6","tags"}'

If I want to search I have to do this:

curl 'http://localhost:9200/_search?q=name:hi&pretty=true'

I'm wondering if it is possible to not duplicate my data inside Elasticsearch, in a way which I can achieve this:

  • get data from KairosDB
  • index them using Elasticsearch without duplicate the data.

How can I go about that?

halfer
  • 19,824
  • 17
  • 99
  • 186
OiRc
  • 1,602
  • 4
  • 21
  • 60
  • Who said anything about duplicating data in ES? Are you mixing up the notions maybe? – Andrei Stefan Nov 02 '15 at 08:13
  • I read a lot of papers online, and i made lots of samples, and i came with this. If you have another idea, you can answer that. – OiRc Nov 02 '15 at 08:42
  • If kairosdb has no native integration with Elasticsearch, you'd need to handle this manually in your client application. Meaning, building "something" that gets data from kairosdb and index it in ES. So, everything is up to you to make the integration and any changes. And about duplication, if you have any links to where you have read that, post them. – Andrei Stefan Nov 02 '15 at 08:46
  • What underlying datastore are you using? H2 or Cassandra? – Val Nov 02 '15 at 09:22
  • @Val i'm using cassandra – OiRc Nov 02 '15 at 09:27
  • As requested by @AndreiStefan, can you clarify what you mean by "not duplicate my data inside Elasticsearch"? What is the use case you're targeting, i.e. can you explain in more details what do you want to use ES for? – Val Nov 02 '15 at 09:45
  • @Val , question updated. – OiRc Nov 03 '15 at 09:16
  • @OiRc where is the duplication because I don't see it? You indexed **once** the data and then you are using URI query to retrieve it... – Andrei Stefan Nov 04 '15 at 05:36

1 Answers1

1

It sounds like you're hoping to use Elasticsearch as a secondary (and external) fulltext index for your primary datastore (KairosDB).

Since KairosDB is remaining your primary datastore, each record you load into Elasticsearch needs two pieces of information (at minimum):

  1. The primary key field(s) for locating the corresponding KairosDB record(s). In the mapping, make sure to set "store": true, "index": "not_analyzed"
  2. Any fields which you wish to be searchable (in your example, only name is searched) "store": false, "index": "analyzed"

If you want to reduce your index size further, consider disabling the _source field


Then your search workflow becomes a two-step process:

  • Query Elasticsearch for name:hi and retrieve the KairosDB primary key field(s) for each of the matching record(s).
  • Query/return KairosDB time-series data using key fields returned from Elasticsearch.

But to be clear. You don't need an exact duplicate of each KairosDB record loaded into Elasticsearch. Just the searchable fields, along with a means to locate the original record in KairosDB.

Peter Dixon-Moses
  • 3,169
  • 14
  • 18