If you want to be absolutely certain that a given user will never experience a delay due to the R process being used by someone else, then yes, you would need 400 threads for 400 concurrent users.
In practice, and if (as you say) the app doesn't have many computationally intensive functions, then you could get away with significantly fewer cores. If it takes, say, 200 ms for a user to compute some function, then you'll need a handful of users on the same thread running such a function at the same time before anyone notices a significant (e.g., 1 s) delay. More likely you can get away with dozens of concurrent connections per thread before the user experience is negatively affected if there is no heavy computation involved.
Having said all this, my understanding of ShinyProxy is that it will spin up a new instance for every new connection, so the number of concurrent users may be limited by the number of cores. I'm not entirely clear on this point. However, if that's the case, and assuming a model of several hundred connections but little processing requirements, I think a better approach might be a handful of Shiny Server instances behind a load balancer. The servers would not need to be particularly powerful (you're only using one thread anyway). Each server can handle >100 connections (I don't know if there is an upper limit).
To summarise, in my view, ShinyProxy is better when you have fewer connections running more computationally intensive apps, while Shiny Server (even the free version) is better when you have more connections to a computationally trivial app.
In terms of specifying infrastructure requirements, a good place to start would be shinyloadtest.