0

I am using the aiodynamo library to query a dynamodb table, but the current code I have is slow. Can someone please suggest modifications or point out errors?

The table is queried with partitionKey=symbol and sortKey=EffectiveTime where the sortKey is in unix time. The responses are written to separate json files for each asof using orjson. The items in the json files are line separated. AWS credentials are stored in the standard aws credentials file located in the .aws folder. I'm writing to file one line (json item) at a time to avoid memory problems.

Please see current code below:

import asyncio
import orjson as json
from httpx import AsyncClient
from aiodynamo.client import Client
from aiodynamo.expressions import HashKey, RangeKey
from aiodynamo.credentials import FileCredentials
from aiodynamo.http.httpx import HTTPX
import pathlib
import os

aws_creds_path = pathlib.Path(os.getenv('AWS_CREDENTIALS_FILE'))

async def get_table_items():
    async with AsyncClient() as h:
        client = Client(HTTPX(h), FileCredentials(path=aws_creds_path, profile_name='prod_profile'), region='eu-west-2')
        table = client.table('stock-prices')
        for asof in ['2023-06-05','2023-06-06', '2023-06-07', '2023-06-08', '2023-06-09']:
            min_key = beginning_day_unix(asof) # this takes asof and converts to 00:00:00 for asof
            max_key = end_day_unix(asof) # this takes asof and converts to 23:59:59 for asof
            json_file = open(f'stock_prices_{asof}.json', 'wb')
            for symbol in ['symbol1', 'symbol2', 'symbol3']: # this list would be in the 1,000s of stock symbols
                async for item in table.query(key_condition=HashKey('symbol', symbol) & RangeKey('EffectiveTime').between(min_key, max_key)):
                    json_file.write(json.dumps(item))
                    json_file.write(b'\n')
                    json_file.flush()
            json_file.close()

async def main():
    await get_table_items()

asyncio.run(main())
Alex
  • 115
  • 1
  • 8

0 Answers0