I have to select more than 1.9 billion rows. I am trying to query a table hosted in a DB in AWS ATHENA console. The table is reading parquet files from the a S3 bucket.
When I run this query:
SELECT * FROM ids WHERE org = 'abcd' AND idkey = 'email-md5';
My query seems to time-Out as there are 1.9 billion rows that are returned when I run a COUNT on it.
I tried OFFSET
along with LIMIT
but it doesn't seem to work in AWS Athena.
Also tried something on the lines
SELECT * FROM ids WHERE org = 'abcd' AND idkey = 'email-md5' LIMIT 0,500;
This doesn't seem to work as well.
Not sure how to chunk with such a large dataset using SELECT?
The aim here is to be able to query the entire dataset without having the query time out.
I ran a COUNT-
SELECT COUNT(*) FROM ids WHERE org = 'abcd' AND idkey = 'email-md5';
And the COUNT returned is 1.9 Billion as mentioned above. I need to pull all the 1.9 Billion rows so that i can then download it in and do further analysis.