1

I am building a service which would have millions of rows of data in it. We wanted to have good search on it. Eg. we can search by some field values. The structure of the row will be like as follows:

{
   "field1" : "value1",
   "field2" : "value2",
   "field3" : {
       "field4": "value4",
       "field5": "value5"
   }
}

Also, the structure of field3 can be changing with field4 present sometime and sometime not.

We wanted to have filters on following fields field1, field2 and field 4. We can create indexes in dynamodb to do that. But I am not sure if we can create index on field4 in dynamodb easily without flattening the json.

Now, my question is, should we use elastic search datastore for it, which as far as I know, will create indexes on every field in the document and then one can search on every field? Is that right? Or should we use dynamodb or completely any other data store?

Please provide some suggestions.

hatellla
  • 4,796
  • 8
  • 49
  • 101
  • You cannot index on field4: "The index key attributes can consist of any top-level String, Number, or Binary attributes from the base table." Of course, you could duplicate field4 at the top-level, if needed, and maintain it. – jarmod Dec 20 '19 at 20:18

2 Answers2

4

If search is a key requirement for your application, then use a search product - not a database. Dynamodb is great for a lot of things, but adhoc search is not one of them - you are going to end up running lots of very expensive (slow) scans if you go with dynamodb; this is what ES was built for.

E.J. Brennan
  • 45,870
  • 7
  • 88
  • 116
  • 1
    Agreed. But at the same time, estimate the cost of a persistent ES cluster so you understand what that will cost, and also look at UltraWarm. – jarmod Dec 20 '19 at 22:31
  • You could move the index to S3 (on a daily basis) and save on costs. – Aakash Gupta Jul 31 '20 at 13:02
2

I've a decent working experience with dynamoDB and extensive working experience with Elasticsearch(ES).

Let's first understand the key difference between these two:

dynamoDB is

Amazon DynamoDB is a key-value and document database

while Elasticsearch

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured data.

Now coming to question, let's discuss how these system works internally and how it affects the performance.

DynamoDB is great to fetch the documents based on keys but not great for filtering and searching, as in relations database for improving performance of these oprations you create index on the columns, in similar way you have to create an index in dynamoDB as its a database, not search engine. And creating index on fields on the fly is pain and its not cached in DynamoDB.

Elasticsearch stores data differently by creating the inverted index for all indexed fields(default as mentioned by OP) and filtering on these fields are super fast if you use the filter context which is the same use case here, more info with example is explained in official ES doc https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#filter-context, Also as these filters are not used for score calculation and cached at elasticsearch so their performance(both read and write) is super fast as compared to dynamoDB and you can benchmark that as well.

Amit
  • 30,756
  • 6
  • 57
  • 88