0

I have setup an AWS Lambda function with python to ingest requests from a CSV and then query an AWS Serverless Aurora PostgreSQL database based on this request. The function works when the requests are less then 1K but I get errors due to a hard limit in the data API. I am trying to figure out a way to break up the query into smaller queries once this limit is hit but not sure how to do this in python. Can someone suggest a way to break up a request into smaller chunks so I do not hit the data API limit?

Snippet of Code Used:

#Set Database connection params
engine = wr.data_api.rds.connect( resource_arn = resource_arn, database=database_name, secret_arn=secret_arn)

#read in s3 csv and select ids
read_df = wr.s3.read_csv(path=s3_path_in)
requested_ids = read_df["ids"]
in_ID = requested_ids.tolist()
in_query= str(tuple(in_ID))

#query postgres
query_id = """
select c.*
from table1 b
INNER JOIN table2 c on b.id = c.id
where b.id in %s
""" % in_query

out_ids = wr.data_api.rds.read_sql_query(query_id, engine) 
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Prof. Falken
  • 499
  • 6
  • 21

1 Answers1

0

one way that i can think of is to use the LIMIT <row_count> clause of the postgres SQL and dynamically pass the row_count to your query .

select c.*
from table1 b
INNER JOIN table2 c on b.id = c.id
where b.id in <>
order by <primary_key>
limit 999

PostgreSQL LIMIT

Ruben
  • 558
  • 3
  • 15
  • I am confused on how this will ensure all of the data is returned? It will return 999 of the entries but what about anything above 999? – Prof. Falken Nov 03 '21 at 18:52
  • take the `count of all the rows` first, and programatically derive the `row_count` till the total number of rows returned. I know it sounds to naive, but worth a try. – Ruben Nov 03 '21 at 18:56
  • Does it help @Prof.Falken ? – Ruben Nov 05 '21 at 16:17