0

I have documents containing the field "Status", this can have three values "Draft", "In Progress", or "Approved". I am trying to pass this document through a ingest pipeline, and if the status is equal to "Approved" then it should add it in the B index, whereas by default it should index in A index irrespective of status value. for ex - 1.

{
"id":"123",
"status":"Draft"
}
{
"id":"1234",
"status":"InProgress"
}
{
"id":"12345",
"status":"Approved"
}

1,2,3 document should go to A Index and only document 3 should go to B Index Is it possible to do it via Ingest Pipeline?

Ashish Mishra
  • 145
  • 3
  • 13

1 Answers1

2

In your ingest pipeline, you can change the _index field very easily like this:

{
  "set": {
    "if": "ctx.status == 'Approved'",
    "field": "_index",
    "value": "index-b"
  }
},
{
  "set": {
    "if": "ctx.status != 'Approved'",
    "field": "_index",
    "value": "index-a"
  }
}

It is worth nothing, though, that you cannot send a document to two different indexes within the same pipeline, it's either index-a or index-b, but not both.

However, this can easily be solved by querying both indexes through an alias that spans both index-a and index-b

Val
  • 207,596
  • 13
  • 358
  • 360
  • Hi @Val Thanks, This really helps, but in my case, one document has these three stages, so once it is approved it needs to be updated in both indexes, or else in index-b it will be left with "in progress" state and in index-a it will be approved. So Alias wont work, as these will be duplicate documents – Ashish Mishra May 05 '22 at 11:07
  • Can you explain how the process that updates the status works? Are you using indexing the full document each time, or you just do partial updates? – Val May 05 '22 at 11:12
  • Any time that document is updated in our couchbase, we are using couchbase connector app which pushes the entire document to elastic with the same id so kind of updating whole document each time. – Ashish Mishra May 05 '22 at 11:29
  • Ok, my next question is why do you need to store the document in two indexes, i.e. all documents in one index and only Approved ones in a second index? – Val May 05 '22 at 11:35
  • we are thinking that if we put all approved data in a different index, as it will have fewer documents, response time will be faster for our end consumer. whereas documents having draft, inprogress state will be searched only internally,so we can have some lag there. Also approved instances will be 10 percent of all instances (draft, inprogress combined)) – Ashish Mishra May 05 '22 at 11:56
  • What kind of volume (i.e. how many documents/GB) are we talking about? – Val May 05 '22 at 11:56
  • It will be around 2-3 GB and 0.5 Million docs roughly – Ashish Mishra May 05 '22 at 11:59
  • OK, very small volume. What about the size of your cluster (hardware specs)? – Val May 05 '22 at 12:00
  • 8 GB RAM, and 30 GB storage, But our use case requires handling very high traffic around 400 TPS, so we want to keep our docs count less and segregate for the end consumer. – Ashish Mishra May 05 '22 at 12:02
  • 1
    I'd say that neither your specs nor data volume require you to over-engineer your solution. You'd be perfectly fine with a single index and the proper filters in your queries (or even a [filtered alias](https://www.elastic.co/guide/en/elasticsearch/reference/current/aliases.html#filter-alias) for your customers to only see Approved docs). Remember what Knuth said: "Premature optimization is the root of all evil" ;-) – Val May 05 '22 at 12:04