I am working on a distributed implementation of Samplesort using AWS Lambda functions. So far, I am using S3 to exchange data between Lambdas, but this is relatively slow. I would like to use WebSockets instead. Is that considered an anti-pattern? If so, why? If not, what is the best way to go about it? The Lambda-to-Lambda payload is 5GB, and the two Lambdas are invoked by a third one (they can't invoke each other). The dataset is made of 10B 64-bit integers.
Asked
Active
Viewed 503 times
0
-
WebSockets is not supported by Lambda on its own, but can be implemented using API Gateway. That said, rather than pass data from Lambda to Lambda via S3 you could simply pass the data directly when Lambda #1 invokes Lambda #2 if you keep the data size below the invocation payload limit of 6MB, or you could pass data indirectly and asynchronously via SQS. – jarmod Jan 11 '22 at 15:07
-
@jarmod Thank you for this. Unfortunately, the payload is 5GB. Also, the two Lambdas are invoked by a separate Lambda. – Ismael Ghalimi Jan 11 '22 at 15:15
-
Is EFS out of scope here? It's a little more complex and costly but would improve the read/write times vs S3. You could store the entire payload plus the persisted results in EFS and orchestrate concurrent Lambdas using Step Functions. – jarmod Jan 11 '22 at 15:26
-
Yes, it's out of the question because its throughput cannot be aggregated. We are trying to sort 10B 64-bit integers under 2 seconds using 400 Lambdas. – Ismael Ghalimi Jan 11 '22 at 15:30
-
Having multiple AWS Lambda functions communicate with each other is definitely an anti-pattern. AWS Lambda is designed to perform quick processing in response to an event -- it is not designed as a distributed processing platform. You might want to re-think your architecture. – John Rotenstein Jan 11 '22 at 21:01
-
@JohnRotenstein How about using Kinesis in between the two Lambdas? – Ismael Ghalimi Jan 11 '22 at 21:54
1 Answers
0
The answer is clearly No, because "inbound network connections are blocked by AWS Lambda" (C.f. AWS Lambda FAQs). Nevertheless, an alternative would consist in using Kinesis. It is possible for a Lambda to push data to a Kinesis stream. It is unclear that existing Lambdas could consume messages from a Kinesis stream, but the architecture of the distributed sort could be modified so that reducing Lambdas are invoked by the Kinesis stream.
Unfortunately, Kinesis pricing is two orders of magnitude too high.

Ismael Ghalimi
- 3,515
- 2
- 22
- 25
-
If you are wanting to consume from a queue, then Amazon SQS would be better than Kinesis (unless it is important to preserve order). It is difficult to offer alternatives without knowing your actual goal -- for example, if this is a continual process, using an Amazon EC2 instance could be better than using AWS Lambda. Using threads on a single computer might be better than the overhead of invoking multiple Lambda functions. – John Rotenstein Jan 11 '22 at 22:58
-
I really want to be serverless to facilitate provisioning. I do not need messages to be in order. SQS would have too much latency, but SNS would work. It would require every Lambda to send 17,600 SNS messages though. – Ismael Ghalimi Jan 12 '22 at 00:03