3

My aim is to use S3 in AWS to store csv files and API Gateway to query those objects and ideally select rows and columns from within the csv files and return them in my web app.

In AWS, there is a method for selecting content from S3 objects. It acts as a filter on a csv file for example to only return certain columns. It can be written in SQL see here: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html

There is also a way to use API Gateway as a proxy for S3 to create an API into the bucket, see here: https://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-s3.html

Can these methods be combined so that I can map API Gateway requests directly to a SQL SELECT content from S3 Object query or do I need to use a Lambda function in the middle or some other technique?

Graham Hesketh
  • 317
  • 3
  • 16
  • 1
    Configuring APIGateway as a proxy to s3 will just let you get/put/delete/head an object, so you will not be able to parse a specific file part. If you want to use S3 Select you still need a place to compute the requests (typically a Lambda), btw be aware of the costs, S3 Select is definitely not an universal solution. Otherwise have a look on Athena, it avoids writing any line of code and is pretty efficient. I hope this helps – Cinn Dec 18 '18 at 10:40
  • When using Athena, i'd need to still query it via a Lambda function right? If so, why not put the S3 Select query in the Lamba and avoid Athena? I don't want to go to Athena directly from javascript because I would like everything running through API Gateway if possible – Graham Hesketh Dec 18 '18 at 13:14
  • yes you still need Lambda to query it, i sent a more detailed response below – Cinn Dec 18 '18 at 14:03
  • You can most definitely use a service integration to an S3 Select operation (you can do this with any AWS API operation). The challenge with S3 Select in particular is that it returns a stream of messages rather than a simple textual response. If this is what you want, all done; if you're after a "plain json" type response, I currently don't know of a way to do this using the velocity response mapping templates. – kadrach May 07 '19 at 06:16

3 Answers3

3

To request a specific file part you can either do it yourself or use one of the AWS managed services S3 Select or Athena. The difference between both is simple: S3 Select over one file, Athena can execute a request over a whole bucket.

Depending on your situation you may use one or the other, you will have to think up the performance needed and admissible costs.

In any case you cannot just plug API Gateway directly to one of this service, you need a middleware processing the requests.

Still i need to mention that it is possible to directly use S3 Select or Athena by-passing API Gateway. If you do so you will have to be really careful on the rights related to the access keys used. You can create in IAM a specific access (very narrow) to S3 and then use an sdk to directly process your queries from the client side. You have more security issues to handle but you avoid the use of both API Gateway and Lambda.

Cinn
  • 4,281
  • 2
  • 20
  • 32
1

The S3 proxy only let you access the files as documented.

For your purpose you need a entity in the middle which will execute additional business logic for you.

I would recommend lambda.

So you do:

api-gateway->lambda->s3

DominikHelps
  • 971
  • 4
  • 10
1

I was able to perform S3 Select query on 100GB of data under 30 seconds in serverless architecture using API Gateway and Lambda. Here is the solution in case you still interested in it. https://github.com/sandyghai/Query-100GB-Data-With-AWS-S3-Select-Under-30-Seconds