This question has been already posted on AWS forums, but yet remains unanswered https://forums.aws.amazon.com/thread.jspa?threadID=94589
I'm trying to to perform an initial upload of a long list of short items (about 120 millions of them), to retrieve them later by unique key, and it seems like a perfect case for DynamoDb.
However, my current write speed is very slow (roughly 8-9 seconds per 100 writes) which makes initial upload almost impossible (it'd take about 3 months with current pace).
I have read AWS forums looking for an answer and already tried the following things:
I switched from single "put_item" calls to batch writes of 25 items (recommended max batch write size), and each of my items is smaller than 1Kb (which is also recommended). It is very typical even for 25 of my items to be under 1Kb as well, but it is not guaranteed (and shouldn't matter anyway as I understand as only single item size is important for DynamoDB).
I use the recently introduced EU region (I'm in the UK) specifying its entry point directly by calling set_region('dynamodb.eu-west-1.amazonaws.com') as there is apparently no other way to do that in PHP API. AWS console shows that the table in a proper region, so that works.
I have disabled SSL by calling disable_ssl() (gaining 1 second per 100 records).
Still, a test set of 100 items (4 batch write calls for 25 items) never takes less than 8 seconds to index. Every batch write request takes about 2 seconds, so it's not like the first one is instant and consequent requests are then slow.
My table provisioned throughput is 100 write and 100 read units which should be enough so far (tried higher limits as well just in case, no effect).
I also know that there are some expenses on request serialisation so I can probably use the queue to "accumulate" my requests, but does that really matter that much for batch_writes? And I don't think that is the problem because even a single request takes too long.
I found that some people modify the cURL headers ("Expect:" particularly) in the API to speed the requests up, but I don't think that is a proper way, and also the API has been updated since that advice was posted.
The server my application is running on is fine as well - I've read that sometimes the CPU load goes through the roof, but in my case everything is fine, it's just the network request that takes too long.
I'm stuck now - is there anything else I can try? Please feel free to ask for more information if I haven't provided enough.
There are other recent threads, apparently on the same problem, here (no answer so far though).
This service is supposed to be ultra-fast, so I'm really puzzled by that problem in the very beginning.