52

I'm trying to compute the size of an item in dynamoDB and I'm not able to understand the definition.

The definition I found : An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths). So it helps if you keep attribute names short.

Does it mean that if I put a number in the database, example: 1 it'll take the size of an int? a long? a double? Will it take the same amount of space than 100 or 1000000 or it'll take only the size of the corresponding binary?

And what is the computing for String?

Is there someone that knows how to compute it?

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
Mike
  • 2,354
  • 3
  • 23
  • 37

8 Answers8

42

That's a non trivial topic indeed - You already quoted the somewhat sloppy definition from the Amazon DynamoDB Data Model:

An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths).

This is detailed further down the page within Amazon DynamoDB Data Types a bit:

  • String - Strings are Unicode with UTF8 binary encoding.
  • Number - Numbers are positive or negative exact-value decimals and integers. A number can have up to 38 digits of precision after the decimal point, and can be between 10^-128 to 10^+126. The representation in Amazon DynamoDB is of variable length. Leading and trailing zeroes are trimmed.

A similar question than yours has been asked in the Amazon DynamoDB forum as well (see Curious nature of the "Number" type) and the answer from Stefano@AWS sheds more light on the issue:

  • The "Number" type has 38 digits of precision These are actual decimal digits. So it can represent pretty large numbers, and there is no precision loss.
  • How much space does a Number value take up? Not too much. Our internal representation is variable length, so the size is correlated to the actual (vs. maximum) number of digits in the value. Leading and trailing zeroes are trimmed btw. [emphasis mine]

Christopher Smith's follow up post presents more insights into the resulting ramifications regarding storage consumption and its calculation, he concludes:

The existing API provides very little insight in to storage consumption, even though that is part (admittedly not that significant) of the billing. The only information is the aggregate table size, and even that data is potentially hours out of sync.

While Amazon does not expose it's billing data via an API yet, they they'll hopefully add an option to retrieve some information regarding item size to the DynamoDB API at some point, as suggested by Christopher.

Community
  • 1
  • 1
Steffen Opel
  • 63,899
  • 11
  • 192
  • 211
  • 2
    Unfortunate that this is vague. Postgres also stores decimals with "variable" length - except that the binary format it uses takes up 8 bytes just for overhead (at least in the binary copy export format). I'm hoping AWS does better! – Καrτhικ Jan 29 '14 at 10:50
  • 1
    Note: In DynamoDB (DDB) Numbers are persisted as strings thus accounting for variable length (123 vs. 1.23456 vs. 123,456,789.1). It is on retrieval of the values that they are converted to their proper data type. The size of any item is the size of the Attribute name plus the value as stated. Also, 'items' are not required to have the same attributes for each in a table (accomplished by omitting the attribute on the 'put' operation). As such, each item may have a different size. DDB on CAP: 'availability' and 'partition tolerant' over 'consistency'. – Zack Jannsen Nov 24 '15 at 11:39
  • One more point - each Global Secondary Index added to a table will increase the data usage by the amount of data in that table. You can think of this as creating a separate table that is automatically kept in sync with the parent. You can decide if some, all or only the index table keys are persisted so you have control over storage vs. accessibility (throughput trade off on recursive queries). Goal: give users the flexibility to define their indexes vs. the overhead RDBMS's place on the data structures automatically. – Zack Jannsen Nov 24 '15 at 11:51
  • Will DynamoDB takes any memory overhead if i am including list in the items? When I insert data in the form of list, DynamoDB is showing each value in the list in the form of map, to know to which datatype that particular item belongs. **"s":"Pur123112"**, is a value in list and it specifies the value belongs to string datatype `{ "Customer ID":"123412341234", "Year":"2016" //stored as a number "Purchase ID":["S":"Pur123112","S":"Pur12317",...100 values], "Date of Purchase":["S":"2016-04-20","S":"2016-05-01",...100 values] }` – unknownerror May 20 '16 at 05:56
13

I found this answer in amazon developer forum answered by Clarence@AWS:

eg:-

"Item":{
"time":{"N":"300"},
"feeling":{"S":"not surprised"},
"user":{"S":"Riley"}
}

in order to calculate the size of the above object:

The item size is the sum of lengths of the attribute names and values, interpreted as UTF-8 characters. In the example, the number of bytes of the item is therefore the sum of

Time : 4 + 3 
Feeling : 7 + 13 
User : 4 + 5          

Which is 36

For the formal definition, refer to: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/WorkingWithDDItems.html

Community
  • 1
  • 1
Asanga Dewaguru
  • 1,058
  • 2
  • 16
  • 31
  • 1
    This answer seems suspect to me, since you're counting characters, not bytes. If the strings are UTF-8 (as another answer reports), you can't represent a character with one byte. – L. Blanc Apr 30 '16 at 23:10
  • 3
    It's mentioned in this answer as well "The item size is the sum of lengths of the attribute names and values, interpreted as UTF-8 characters". And a utf-8 character can take 1-4 bytes. Each character given in "this" sample calculation occupies only 1 byte. (just check the length here: https://mothereff.in/byte-counter) – Asanga Dewaguru May 01 '16 at 01:40
  • 1
    This seems to be quite accurate. I have a very large item (over 16,000 values in it) and the above formula matches perfectly the the consumed capacity that get-item returns (6) multiplied by 4K. Whereas the raw JSON output itself (from aws-cli) with whitespace removed was over 100K. – Damon Maria Apr 03 '18 at 03:17
  • so Dynamo uses 3 bytes to represent the number `300`, that is so strange. @AsangaDewaguru Do you still have the link to the answer on AWS forum? thanks – kkkkkkk Mar 08 '19 at 03:16
  • @Khang : https://forums.aws.amazon.com/message.jspa?messageID=327242#327242 – Asanga Dewaguru Mar 12 '19 at 10:48
  • You also need to add one byte of overhead for each attribute in that example. – Amit Naidu Jun 19 '19 at 12:29
  • Looks like DynamoDB changed its way of number size calculation: *The size of a number is approximately (length of attribute name) + (1 byte per two significant digits) + (1 byte).* https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CapacityUnitCalculations.html – guijob Aug 03 '19 at 18:50
  • So, this piece of code `"time":{"N":"300"}` has 4 + 1 + 1 = 6 bytes – guijob Aug 03 '19 at 19:12
7

An item’s size is the sum of all its attributes’ sizes, including the hash and range key attributes. Attributes themselves have a name and a value. Both the name and value contribute to an attribute’s size. Names are sized the same way as string values. All values are sized differently based on their data type.

If you're interested in the nitty-gritty details, have a read of this blog post.

Otherwise, I've also created a DynamoDB Item Size and Consumed Capacity Calculator that accurately determines item sizes.

Numbers are easily DynamoDB's most complicated type. AWS does not publicly document how to determine how many bytes are in a number. They say this is so they can change the internal implementation without anyone being tied to it. What they do say, however, sounds simple but is more complicated in practice.

Very roughly, though, the formula is something like 1 byte for every 2 significant digits, plus 1 extra byte for positive numbers or 2 for negative numbers. Therefore, 27 is 2 bytes and -27 is 3 bytes. DynamoDB will round up if there’s an uneven amount of digits, so 461 will use 3 bytes (including the extra byte). Leading and trailing zeros are trimmed before calculating the size.

Zac Charles
  • 1,208
  • 14
  • 19
5

You can use the algorithm for computing DynamoDB item size in the DynamoDB Storage Backend for Titan DynamoDBDelegate class.

Alexander Patrikalakis
  • 5,054
  • 1
  • 30
  • 48
1

All the above answers skip the issue of storing length of attributes as well as length of attribute names and the type of each attribute.

The DynamoDB Naming Guide says names can be 1 to 255 characters long which implies a 1 byte name length overhead.

We can work back from the 400kb maximum item limit to know there's an upper limit on the length required for binary or string items - they don't need to store more than a 19bit number for the length.

Using a bit of adaptive coding, I would expect:

  • Numbers have a 1 byte leading type and length value but could also be coded into a single byte (eg: a special code for a Zero value number, with no value bytes following)
  • String and binary have 1-3 bytes leading type and length
  • Null is just a type byte without a value
  • Bool is a pair of type bytes without any other value
  • Collection types have 1-3 bytes leading type and length.

Oh, and DynamoDB is not schemaless. It is schema-per-item because it's storing the types, names and lengths of all these variable length items.

Andy Dent
  • 17,578
  • 6
  • 88
  • 115
0

An approximation to how much an item occupies in your DynamoDB table is to do a get petition with the boto3 library.

This is not an exact solution on to which is the size of an element, but it will help you to make an idea. When performing a batch_get_item(**kwargs) you get a response that includes the ConsumedCapacity in the following form:

....
'ConsumedCapacity': [
    {
        'TableName': 'string',
        'CapacityUnits': 123.0,
        'ReadCapacityUnits': 123.0,
        'WriteCapacityUnits': 123.0,
        'Table': {
            'ReadCapacityUnits': 123.0,
            'WriteCapacityUnits': 123.0,
            'CapacityUnits': 123.0
        },
        'LocalSecondaryIndexes': {
            'string': {
                'ReadCapacityUnits': 123.0,
                'WriteCapacityUnits': 123.0,
                'CapacityUnits': 123.0
            }
        },
        'GlobalSecondaryIndexes': {
            'string': {
                'ReadCapacityUnits': 123.0,
                'WriteCapacityUnits': 123.0,
                'CapacityUnits': 123.0
            }
        }
    },
]
...

From there you can see how much capacity units it took and you can extract and aproximated size of the item. Obviously this is based in your configuration of the system due to the fact that:

One read request unit represents one strongly consistent read request, or two eventually consistent read requests, for an item up to 4 KB in size. Transactional read requests require 2 read request units to perform one read for items up to 4 KB. If you need to read an item that is larger than 4 KB, DynamoDB needs additional read request units. The total number of read request units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.

Drubio
  • 1,157
  • 3
  • 13
  • 30
0

Quite an old question. Since then AWS has clarified how DynamoDB computes the storage size per item type.

Quoting the DynamoDB Item sizes and formats page:

  • Strings are Unicode with UTF-8 binary encoding. The size of a string is (length of attribute name) + (number of UTF-8-encoded bytes).
  • Numbers are variable length, with up to 38 significant digits. Leading and trailing zeroes are trimmed. The size of a number is approximately (length of attribute name) + (1 byte per two significant digits) + (1 byte).
  • A binary value must be encoded in base64 format before it can be sent to DynamoDB, but the value's raw byte length is used for calculating size. The size of a binary attribute is (length of attribute name) + (number of raw bytes).
  • The size of a null attribute or a Boolean attribute is (length of attribute name) + (1 byte).
  • An attribute of type List or Map requires 3 bytes of overhead, regardless of its contents. The size of a List or Map is (length of attribute name) + sum (size of nested elements) + (3 bytes) . The size of an empty List or Map is (length of attribute name) + (3 bytes).
  • Each List or Map element also requires 1 byte of overhead.

So in the following example:

{
  "Temperature":{"N":"12.3456"}
}

the storage size is 11 + 6/2 + 1 = 11 + 3 + 1 = 15 bytes

Giorgio Ruffa
  • 456
  • 3
  • 7
-2

The simplest approach will be to create a item in the table and export the item to csv file which is a option available in DynamoDB. The size of the csv file will give you the item size approximately.

  • Thats not true. There are also metadata in this CSV file - e.g. information about type for each attributes, parenthesis, whitespaces, etc. – rbrisuda May 18 '18 at 07:51