I am looking for the ways to optimize the architecture of a solution in AWS which collects and analyzes the information from different data providers on behalf of our clients. The providers grant the access to their content only by IP authentication(that is a strict requirement) and for every client we have a dedicated host with a corresponding elastic IP attached to it. A number of clients increased dramatically(over 200) and now we have to maintain >200 instances where each instance in turn carries Squid(for the browser access) and one of our custom REST proxies(for the application server) to fetch the data on client's behalf.
I learned that AWS internet gateway works effectively as NAT by translating a private IP of an instance into the corresponding Elastic IP. I also read that AWS allows to intercept the traffic inside VPC for the security and the audit purposes with the via middlebox appliance and gateway load balancer(GWLB). That left me wondering whether it is feasible to get rid of these 200+ instances with the following approach:
- create 200 ENIs with Elastic IPs but do not attach them to anything
- create a single instance which would send the requests to GWLB/middlebox. Each request will indicate somehow which client it belongs to(e.g. using information transferred in the payload)
- The middlebox replaces the source IP in the packet with the client's private IP(or better to say the private IP of ENI which has client's Elastic IP attached) and sends it back to GWLB
- GWLB forwards the packet to the internet gateway(IGW)
- IGW substitutes the private IP with the client's Elastic IP and sends the request to a data provider
- Once IGW receives the response from the data provider it replaces the public IP to the private one and sends it to GWLB/middlbox to take actions.
- The middlebox sees that source private IP and updates the packet so it could get to its final destination - the single instance where the request was originated.
The items which concern me the most are #3 and #5 because:
- I don't know if there are any constrains on ip source address spoofing in VPC.
- I don't know if IGW would properly map the private and public Elastic IPs if only ENIs were created
I am not looking for a complete recipe just more like trying to get an opinion if there are some fundamental flaws which make the idea impossible.
Also here are the diagrams for the current and target architectures: