60

In Python, when an item is retrieved from Dynamo DB using boto3, a schema like the following is obtained.

{
  "ACTIVE": {
    "BOOL": true
  },
  "CRC": {
    "N": "-1600155180"
  },
  "ID": {
    "S": "bewfv43843b"
  },
  "params": {
    "M": {
      "customer": {
        "S": "TEST"
      },
      "index": {
        "N": "1"
      }
    }
  },
  "THIS_STATUS": {
    "N": "10"
  },
  "TYPE": {
    "N": "22"
  }
}

Also when inserting or scanning, dictionaries have to be converted in this fashion. I haven't been able to find a wrapper that takes care of such conversion. Since apparently boto3 does not support this, are there better alternatives than implementing code for it?

manelmc
  • 937
  • 2
  • 7
  • 17
  • The response syntax is already in `dict`. No conversion required. boto3 documentation show it is a `dict` object http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#DynamoDB.Client.get_item – mootmoot May 03 '17 at 12:28
  • 9
    Yes It is a dict object but the types have to be made explicit. That's why I refer to it as a conversion. – manelmc May 03 '17 at 12:51
  • 1
    Maybe you question title doesn't tally with your need. Please change the title and clarify what you need. Bare in mind that, there is no "standard" parser for NoSQL results. You need to deal with each element data definition. – mootmoot May 03 '17 at 14:05
  • 3
    @manelmc The boto3 `Table` resource will do this for you. [docs](http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#table) – Jordon Phillips May 03 '17 at 23:39
  • Thanks @JordonPhillips that's exactly what I was looking for, sorry if my question is misleading. – manelmc May 04 '17 at 08:41
  • Perhaps the phrase should be "a straightforward dictionary " ... – MikeW Jun 01 '23 at 11:31

4 Answers4

114

In order to understand how to solve this, it's important to recognize that boto3 has two basic modes of operation: one that uses the low-level Client API, and one that uses higher level abstractions like Table. The data structure shown in the question is an example of what is consumed/produced by the low-level API, which is also used by the AWS CLI and the dynamodb web services.

To answer your question - if you can work exclusively with the high-level abstractions like Table when using boto3 then things will be quite a bit easier for you, as the comments suggest. Then you can sidestep the whole problem - python types are marshaled to and from the low-level data format for you.

However, there are some times when it's not possible to use those high-level constructs exclusively. I specifically ran into this problem when dealing with DynamoDB streams attached to Lambdas. The inputs to the lambda are always in the low-level format, and that format is harder to work with IMO.

After some digging I found that boto3 itself has some nifty features tucked away for doing conversions. These features are used implicitly in all of the internal conversions mentioned previously. To use them directly, import the TypeDeserializer/TypeSerializer classes and combine them with dict comprehensions like so:

import boto3

low_level_data = {
  "ACTIVE": {
    "BOOL": True
  },
  "CRC": {
    "N": "-1600155180"
  },
  "ID": {
    "S": "bewfv43843b"
  },
  "params": {
    "M": {
      "customer": {
        "S": "TEST"
      },
      "index": {
        "N": "1"
      }
    }
  },
  "THIS_STATUS": {
    "N": "10"
  },
  "TYPE": {
    "N": "22"
  }
}

# Lazy-eval the dynamodb attribute (boto3 is dynamic!)
boto3.resource('dynamodb')

# To go from low-level format to python
deserializer = boto3.dynamodb.types.TypeDeserializer()
python_data = {k: deserializer.deserialize(v) for k,v in low_level_data.items()}

# To go from python to low-level format
serializer = boto3.dynamodb.types.TypeSerializer()
low_level_copy = {k: serializer.serialize(v) for k,v in python_data.items()}

assert low_level_data == low_level_copy
killthrush
  • 4,859
  • 3
  • 35
  • 38
  • 7
    Or much better, and Python2 compatible: python_data = deserializer.deserialize({'M':low_level_data}) – aaa90210 Dec 03 '18 at 03:00
  • 13
    Note with `boto3==1.9.79`, I had to import the deserializer a different way: `from boto3.dynamodb.types import TypeDeserializer`. The module source code shows the deserializer is not exposed (anymore?) as @killthrush originally explained. – Eric Platon Jan 20 '19 at 00:50
  • Hmm... I tried a clean virtualenv with both `1.9.79` and `11.9.82` in the REPL and was not able to reproduce @Eric Platon. The original code seemed to work for me both times. Are you doing something different? – killthrush Jan 21 '19 at 12:13
  • 1
    This work like a charm! I was trying to copy a dynamodb table to another one and I had to use the low level api + the high level to do the batch writing. This saved me. Thanks! – amaurs Sep 06 '19 at 01:13
  • 1
    Note that this will not support 'B' (Binary type) if your "low_level" comes from json.loads, due to the data being a utf-8 string when it needs to be base64 bytes. I had to either pre-process and look for 'B', or simply monkey-patch `deserializer._deserialize_b` to b64decode for this case only. – Pierre-Francoys Brousseau Apr 24 '20 at 06:07
  • I think this is still not supported for string set :( – Haktan Suren Jul 06 '20 at 22:45
  • 1
    `{'Message': 'New item!', 'Id': Decimal('101')}` DataType `Decimal` is added to the Value. How to avoid? – Thirumal Oct 11 '20 at 08:08
  • 1
    @Thirumal - Decimals are used automatically to avoid loss of precision between DynamoDB's `Number` type and python floats. There's been a feature request open for years in boto3: https://github.com/boto/boto3/issues/369. Maybe one of those workarounds might help you? If you're storing integers, then I agree you really don't need Decimal here. – killthrush Oct 14 '20 at 13:06
  • 1
    This doesn't work for 'L' types. Needs to recursively deserialize those – Joe Aug 26 '22 at 21:16
25

You can use the TypeDeserializer class

from boto3.dynamodb.types import TypeDeserializer
deserializer = TypeDeserializer()

document = { "ACTIVE": { "BOOL": True }, "CRC": { "N": "-1600155180" }, "ID": { "S": "bewfv43843b" }, "params": { "M": { "customer": { "S": "TEST" }, "index": { "N": "1" } } }, "THIS_STATUS": { "N": "10" }, "TYPE": { "N": "22" } }
deserialized_document = {k: deserializer.deserialize(v) for k, v in document.items()}
print(deserialized_document)
Fellipe
  • 493
  • 5
  • 7
3

There is a python package called "dynamodb-json" that can help you achieve this. The dynamodb-json util works the same as json loads and dumps functions. I prefer using this as it takes care of converting Decimal objects inherently.

You can find examples and how to install it by following this link - https://pypi.org/project/dynamodb-json/

aamir23
  • 1,143
  • 15
  • 23
0

I went down writing a custom solution

It doesnt cover all types, but enough for the ones I use. Good starting ground for anyone to develop further,

from re import compile as re_compile


class Serializer:
    re_number = re_compile(r"^-?\d+?\.?\d*$")

    def serialize(self, data: any) -> dict:
        if isinstance(data, bool):  # booleans are a subtype of integers so place above int
            return {'BOOL': data}
        if isinstance(data, (int, float)):
            return {'N': str(data)}
        if isinstance(data, type(None)) or not data:  # place below int (0) and bool (False)
            # returns NULL for empty list, tuple, dict, set or string
            return {'NULL': True}
        if isinstance(data, (list, tuple)):
            return {'L': [self.serialize(v) for v in data]}
        if isinstance(data, set):
            if all([isinstance(v, str) for v in data]):
                return {'SS': data}
            if all([self.re_number.match(str(v)) for v in data]):
                return {'NS': [str(v) for v in data]}
        if isinstance(data, dict):
            return {'M': {k: self.serialize(v) for k, v in data.items()}}
        return {'S': str(data)}  # safety net to catch all others

    def deserialize(self, data: dict) -> dict:
        _out = {}
        if not data:
            return _out
        for k, v in data.items():
            if k in ('S', 'SS', 'BOOL'):
                return v
            if k == 'N':
                return float(v) if '.' in v else int(v)
            if k == 'NS':
                return [float(_v) if '.' in _v else int(_v) for _v in v]
            if k == 'M':
                return {_k: self.deserialize(_v) for _k, _v in v.items()}
            if k == 'L':
                return [self.deserialize(_v) for _v in v]
            if k == 'NULL':
                return None
            _out[k] = self.deserialize(v)
        return _out

Usage

serialized = Serializer().serialize(input_dict)
print(serialized)

deserialized = Serializer().deserialize(serialized)
print(deserialized)

DynamoDB (python)

dynamodb = boto3.client('dynamodb')

dynamodb.put_item(
    TableName=table_name,
    Item={
        'id': {'S': id},
        'data': Serializer().serialize(data)
    }
)

response = dynamodb.get_item(
    TableName=table_name,
    Key={
        'id': {'S': id}
    }
)
data = Serializer().deserialize(response['Item'])
Christian
  • 3,708
  • 3
  • 39
  • 60