I have started exploring Knative recently and I am trying to understand how concurrency and autoscaling work. I understand that (target) concurrency refers to the number of requests that can be scheduled to a single Pod for a given revision at the same time.
However, I am not sure I understand which is the impact of having a value of concurrency greater than 1. What happens when N requests are scheduled to the same Pod? Will they be processed one at a time in a FIFO order? Will multiple threads be spawned to serve them in parallel (possibly competing for CPU resources)?
I am tempted to set concurrency=1 and rely on autoscaling to handle multiple requests through multiple Pods, but I guess this is not the best thing to do.
Thanks in advance