There are some examples available on the internet to customize _id field for a Elasticsearch document but is there a way to generate a composite _id of multiple fields.
Sample Data
{
"first_name": "john",
"last_name": "doe",
"dob": "1987-12-21",
"phone": "7894456123".
"so": "on"...
}
How can I configure the index pipeline to generate _id
from the join of first the 4 fields which for the use-case considered to be the composite primary key.
Things to take care:
- There is character limit on _id but the join of the 4 fields can exceed that anytime.
- using some kind of separate so there can't be 2 docs with different fields value but same joined value.
I considered using hashing algo like MD5
and SHA256
which can generate fixed length _ids from the "|".join(first,last,dob,phone)
. but not able to implement in the ingestion pipeline
This is not a security concern as we only trying to define a primary key and indexes are on a monthly rolling bases.
So if we can find a storage efficient _id value that is preferred.
if there are other ways to achieved the use-case please suggest.