9

If you are trying to install 500 Linux system through network installation at the same time, the bottleneck would be the NFS/HTTP/FTP or whatever server holds files you need for installation.

IMO, this can only be solved by adding more installation servers and then round-robin them.

Is there any better solution to this problem? Something like "P2P Linux installation"?

UPDATE: I need to describe my situation more specificly. Currently I'm deploying RHEL using kickstart+NFS. When I try to deploy 500 RHEL concurrently, the NFS server will have a huge traffic and makes every install process slow. Setting up more NFS servers is a solution but I don't think it's a good one.

yegle
  • 696
  • 7
  • 18
  • What dou you want to do? Install the base-system, manage the system afterwards? deploy configurations or applications? – shakalandy Apr 13 '11 at 15:19
  • @shakalandy I'm deploying RHEL using kickstart+NFS. When I try to deploy 500 RHEL concurrently, the NFS server will have a huge traffic and makes every install process slow. Setting up more NFS servers is a solution but I don't think it's a good one. – yegle Apr 13 '11 at 15:36

6 Answers6

7

This is usually where Multicast imaging comes along. Something like Clonezilla or ghost supports sending the data multicast which would let you push out the image to all 500 systems at once at basically the same speed as pushing the image out to 1 system.

dgibbons
  • 86
  • 1
3

The Avalanche installer of the rocks linux cluster distro, is bittorrent based and scales nicely. It also takes you from PXE boot to running system. Although, you're tied to using rocks (CentOS based) and doing things the rocks way.

Kjetil Joergensen
  • 5,994
  • 1
  • 27
  • 20
2

SystemImager can also use bit-torrent for faster mass deployment.

Not Now
  • 3,552
  • 18
  • 19
1

I would not use multicast because this makes things more complicated. First, try to minimise NFS traffic, that means get the packages you need to install via HTTP. If your web server for the package repository gets overloaded, use two of them and distribute the load by assigning different servers to each client (for example ip address modulo 2).

Your NFS server may be faster if more nfsd daemons will be startet. Often only 8 of them are started.

I just measured the traffic of an Debian installation (via PXE, NFS, HTTP) using FAI. When installing 4.2GB of software, 1.3 GB of HTTP (all the packages) and 100MB NFS traffic (the nfsroot during installation) were send over the network. This was for one install client. So I guess reducing the NFS traffic and distributing the HTTP traffic will help a lot.

A 10 GB NIC in your server or bonding serveral NIC's would also help. And, I think it's not need to install all the machine at the same time, but more in a short time frame.

But anyway, first you have to analyse what your bottleneck will be. So make some tests unsig 20 machines for e.g.

0

Concerning the mass deployment of files there is already a solution brought by twitter, based on bittorent: Murder.

If you are talking about installing the OS on your server it obviously won't work with this solution.

William Hilsum
  • 3,536
  • 6
  • 29
  • 39
Shadok
  • 623
  • 5
  • 10
0

I don't know of a way to use bittorrent or multicast unless you're able to switch to deploying an image rather than performing installations. In case you aren't, here is one way to approach the problem.

Let's think more closely about the bottleneck. CPU isn't the bottleneck; NFS doesn't require much processing power. Disk isn't the bottleneck; the files needed to install RHEL aren't more than a few gigabytes, so they should easily fit in your NFS server's RAM. Network throughput is definitely a bottleneck; assuming one system being installed will request on average 50 megabits a second, you'd need at least 25 gigabits of bandwidth to feed 500 installs. That's a lot of NICs, or a few very expensive ones.

This doesn't mean you shouldn't try to improve performance by throwing more hardware at it, within reason. Get as many NICs as are feasible in the NFS server and bond them. If you can justify the time and cost, set up more NFS servers. Of course, make sure your NFS servers are well tuned.

Regardless of whether you add hardware, see if you get an increase in performance by avoiding network congestion and balancing the peaks and troughs in throughput. To do this, break the installs into batches. Perform a single install and graph the throughput during the install. Look at that graph and determine how many installs you can start concurrently and when the optimal times to start more batches are.

For example, let's say you can transfer 4Gb/s from the NFS server(s). Maybe you'll find that an install copies 100Mb/s for the first minute while the installer is being downloaded, then it copies no data for one minute while the installer does work like partitioning, then it copies 50Mb/s for three minutes while the installer downloads and extracts packages. Knowing this, you could calculate that you can start 40 installs, wait one minute, start another 40 installs, wait 5 minutes, then repeat the process.

sciurus
  • 12,678
  • 2
  • 31
  • 49