41

We're storing organization names in a DynamoDB table on AWS, and would like to maintain official capitalization in those business names, for example in "TNT" and "FedEx".

Our use case is that users of the application can search for organizations by name, but we'd like that their queries are interpreted case-insensitively. So, queries for "FedEx", "Fedex" or "fedex" should all return the correct item in the table.

Other databases have ways to perform queries ignoring case (for example by the ILIKE key word in PostgreSQL), by expressing queries via regular expressions, or by applying functions in the condition (for example the LOWER() function).

How can this be done in DynamoDB? The documentation on Amazon DynamoDB's Query does not provide an answer.

(The best work-around seems to be storing the name twice: once with the official capitalization in effect, and once in another field with the name converted to lowercase. Searching should then be done on the latter field, with the query search term also converted to lowercase. Yes, I know it adds redundancy to the table. It's a work-around, not an optimal solution.)

Jochem Schulenklopper
  • 6,452
  • 4
  • 44
  • 62
  • 1
    Another possible solution (workaround) might be to create a secondary index with all the keys cast to lowercase. That secondary index can be queried or scanned, as described at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html. – Jochem Schulenklopper Jun 01 '17 at 08:10
  • 1
    Duplicate questions: https://stackoverflow.com/questions/34921224/case-insensitive-query-in-dynamo-db and https://stackoverflow.com/questions/19657815/case-insensitive-queries-with-amazon-dynamo-db?rq=1 – Sarah Messer Aug 07 '17 at 20:12
  • @JochemSchulenklopper do you have any information on how to manipulate an index value, such as casting to lowercase, when creating a secondary index? – Pappa Oct 04 '17 at 18:30
  • 1
    @Pappa, we're adding a search field to the table with `locationSearch: locationStr.toLowerCase()` in addition to the regular `location: locationStr` field. Then, when querying, we do `ScanFilter: { "locationSearch": { ComparisonOperator: "CONTAINS", AttributeValueList: [locationString.toLowerCase()] } }` – Jochem Schulenklopper Oct 04 '17 at 20:42

2 Answers2

43

yes, exactly, when you add the new item/row, add also a new field searchName, that is the lowercase (even more, maybe only letters/numbers/spaces) of the your name field. and then search by that searchName field

UXDart
  • 2,500
  • 14
  • 12
  • 25
    Yeah, DynamoDB sucks, lots of hacks and workaround for just simple things – Vedmant Jun 15 '20 at 08:11
  • 21
    DynamoDB's whole purpose is to make it impossible to write a query that isn't scalable. That's why there are certain limitations and "workarounds" and why the data needs to be structured this way. – Andrew Nov 28 '20 at 04:05
  • 4
    To @Andrew point, this is the age old software dilemma that everything is a tradeoff. DynamoDB is optimized for scalability at the cost of flexibility. If you think it sucks, then more likely it is a poor fit for your use case. – Ryan Jun 15 '22 at 18:00
  • In addition to @Andrew 's comment, maybe it is also the time to think about having another bounded context that will take care of search functionalities and save the search name in the context to not pollute main context. – pointer Mar 28 '23 at 23:10
-13

Writing duplicate data in dynamodb is not a good design. The best solution would be to add ' elastic search ' to dynamodb. You can connect this component ' out of the box' using the aws console. Then use custom anayzer in elastic search to get case insensitive data.

Srini Sydney
  • 564
  • 8
  • 17
  • 8
    Instead of setting up Elasticsearch and making the integration work, wouldn't it be easier to create a secondary index (with all characters cast to lowercase) on the table, and query/scan that index as decribed on http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html ? – Jochem Schulenklopper Jun 01 '17 at 08:08
  • 3
    A secondary index is probably an easier initial investment than setting up ElasticSearch, but in the long run it's less flexible and harder to maintain. The issue with a secondary index is that any time you update data in the search field, you _also_ need to update the secondary index. If your data and/or schema change frequently, or if you have multiple pieces of code trying to write to the same table, _all_ of them need to handle the secondary index consistently. ElasticSearch makes such indexing transparent. – Sarah Messer Aug 07 '17 at 20:28
  • 35
    "Writing duplicate data in dynamodb is not a good design." For what it's worth I disagree with this statement as duplicate data is often a very good practice in a NoSQL database. It is often necessary to support your access patterns, or at least to monkey patch a database that didn't consider the access patterns enough from the start. – dillon.harless Jan 02 '20 at 17:20
  • 4
    Seconding the above responses. Setting up an entirely new storage type for such a simple operation (especially one as complex to manage as Elasticsearch) isn't justified. If the intent is to avoid duplicating data, it doesn't accomplish that either, as you *still* have the data in 2 places, except now it's easy for them to get out of sync. Adding an extra dynamo column is comparably simple and doesn't cause any issue. – Chris Williamson Apr 29 '20 at 18:30
  • 1
    This answer is flat out wrong, DynamoDB often has duplicate data, such as in attributes which you need for your data access patterns, such as a sort key as `sortKey : /${region}/${category}/${productName}`, where all these attributes would also live in their own columns. – Derrops Feb 13 '21 at 08:03
  • For performance gains, we need to sometimes summarise data or make more searchable, indexable data. I wouldn't say like it's not a good practice since architecturally you have to use the best way to deliver quality attributes that the system needs. – sandaru.ny Jun 16 '21 at 07:49
  • Global secondary indexes are created by duplicating the data. This means that performance by duplication is by design a technique with Dynamo DB – anand Dec 11 '21 at 20:07