I don't know of a way to use bittorrent or multicast unless you're able to switch to deploying an image rather than performing installations. In case you aren't, here is one way to approach the problem.
Let's think more closely about the bottleneck. CPU isn't the bottleneck; NFS doesn't require much processing power. Disk isn't the bottleneck; the files needed to install RHEL aren't more than a few gigabytes, so they should easily fit in your NFS server's RAM. Network throughput is definitely a bottleneck; assuming one system being installed will request on average 50 megabits a second, you'd need at least 25 gigabits of bandwidth to feed 500 installs. That's a lot of NICs, or a few very expensive ones.
This doesn't mean you shouldn't try to improve performance by throwing more hardware at it, within reason. Get as many NICs as are feasible in the NFS server and bond them. If you can justify the time and cost, set up more NFS servers. Of course, make sure your NFS servers are well tuned.
Regardless of whether you add hardware, see if you get an increase in performance by avoiding network congestion and balancing the peaks and troughs in throughput. To do this, break the installs into batches. Perform a single install and graph the throughput during the install. Look at that graph and determine how many installs you can start concurrently and when the optimal times to start more batches are.
For example, let's say you can transfer 4Gb/s from the NFS server(s). Maybe you'll find that an install copies 100Mb/s for the first minute while the installer is being downloaded, then it copies no data for one minute while the installer does work like partitioning, then it copies 50Mb/s for three minutes while the installer downloads and extracts packages. Knowing this, you could calculate that you can start 40 installs, wait one minute, start another 40 installs, wait 5 minutes, then repeat the process.