4

I'm trying to set up AWS' Cloudsearch with a DynamoDB table. My data structure is something like this:

{
  "name": "John Smith",
  "phone": "0123 456 789"
  "business": {
    "name": "Johnny's Cool Co",
    "id": "12345",
    "type": "contractor",
    "suburb": "Sydney"
  },
  "profession": {
    "name": "Plumber",
    "id": "20"
  },
  "email": "johnsmith@gmail.com",
  "id": "354684354-4b32-53e3-8949846-211384",
}

Importing this data from DynamoDB -> Cloudsearch is a breeze, however I want to be able to index on some of these nested object parameters (like business.name, profession.name etc).

Cloudsearch is pulling in some of the nested objects like suburb, but it seems like it's impossible for it to differentiate between the name in the root of the object and the name within the business and profession objects.

Questions:

  1. How do I make these nested parameters searchable? Can I index on business.name or something?
  2. If #1 is not possible, can I somehow send my data through a transforming function before it gets to Cloudsearch? This way I could flatten all of my objects and give the fields unique names like businessName and professionName

EDIT:

My solution at the moment is to have a separate DynamoDB table which replicates our users table, but stores it in a CloudSearch-friendly format. However, I don't like this solution at all so any other ideas are totally welcome!

JVG
  • 20,198
  • 47
  • 132
  • 210

1 Answers1

0

You can use dynamodb streams and write a function that runs in lambda to capture changes and add documents to cloudsearch, flatenning them at that point, instead of keeping an additional dynamodb table.

For example, within my lambda function I have logic that keeps the list of nested fields (within a "body" parent in this case) and I create a just flatten them with their field name, in the case of duplicate sub-field names you can append the parent name to create a new field such as "body-name" as the key.

... misc. setup ...
headers = { "Content-Type": "application/json" }
indexed_fields = ['app', 'name', 'activity'] #fields to flatten
def handler(event, context): #lambda handler called at each update
    document = {} #document to be uploaded to cloudsearch
    document['id'] = ... #your uid, from the dynamo update record likely
    document['type'] = 'add' 
    all_fields = {}

    #flatten/pull out info you want indexed
    for record in event['Records']:
        body = record['dynamodb']['NewImage']['body']['M']
        for key in indexed_fields:
            all_fields[key] = body[key]['S']

    document['fields'] = all_fields

    #post update to cloudsearch endpoint
    r = requests.post(url, auth=awsauth, json=document, headers=headers)    
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
dbish
  • 53
  • 8