Need to make an AWS deployment decision. A lot of this tech (docker, beanstalk) is pretty new so I don't know best practices (and I'm also foggier than I'd like to be on networking and security).
Tech details: I have a docker application from a client (python w/fastapi) that takes POST requests and spits out machine learning model results. I can build the application locally, but need to deploy it on AWS in a scalable fashion. I managed to deploy it to Elastic Beanstalk (similar to this tutorial), which gives me scalability but gives me a public url http://myapp.eba-ri5rfu4f.us-central-1.elasticbeanstalk.com which already has random bots sending GET .env requests. It doesn't need to interact with my IOT network, just other cloud apps, probably a lambda function.
What's the simplest way of deploying this? Here's my understanding of the options (you don't have to address all of these bad ideas, just want to demonstrate I've given some thought to solutions):
1. Add token: The POST input json could also expect a security token and return 404 if otherwise.
Problems: Requires modifying docker application source code (don't want to have to do this!). Also still open on the internet serving malicious GET requests all day.
- Build VPC: Could make a VPC that all our cloud apps use. Problems: Don't know how to do this, or if it'll even work. Maybe I'll need one anyway? But feel like I'm adding a whole layer of architecture to maintain just so one piece gets some security.
3. Security groups: Maybe I just need to add my Elastic Beanstalk to a security group, allow only approved IP addresses through the firewall and that solves everything.
Problems: Don't think this works, it's not that simple.
4. Deploy as lambda function: It'll only interface with whatever resource triggers it, no need for a public URL.
Problems: Requires modifying docker source code to work with a lambda handler instead of an API. Plus feels like putting a hat on a hat, running a server in a docker container then deploying it in a "serverless" environment. Does it have to spin up a server every time the function is invoked? (Also I already tried to do this with a 2 dockerfile solution I found and gave up after it didn't work.)
5. Do nothing: Our data model is meaningless to everyone, stop wasting time on this.
Problems: A malicious actor could still figure out how to do proper requests and charge us thousands in AWS fees. Don't know why someone would do this, but it just looks and feels bad to just make intellectual property public.
Appreciate any advice or feedback on this problem. I know it's an open ended question, I just need to brainstorm and confirm I'm not missing an obvious solution.
UPDATE
Using a VPC: Turns out AWS has a default VPC, so I think the best solution is to add my Beanstalk to one. I created a new beanstalk environment, this time selecting a VPC subnet under Configuration->Network. It still has a public URL though:
Instance subnets: subnet-42ttr89
Public IP address: enabled
VPC: vpc-5910921
Visibility: public
Think I'm closer, still stuck though since I don't see a way to change these settings and make it private.