I actually found a blog post with the information I was looking for. They mention that their cluster has computers properly synchronized at 1ms using NTP.
It looks like PTP, as suggested by Michael Hampton, follows the same strategy as in: it will make use of one computer, the grandmaster, as the source for time synchronization, opposed to trying to get the correct absolute time on all computers (as a result, if the grandmaster is off by 10ms from what the world considered absolute real time, all nodes will be off by 10ms).
The solution proposed in that document is to:
1) Setup one computer to retrieve absolute time with NTP. If that one computer goes down, the clocks may start drifting, but they will not become inaccurate between each others, they will be drifting compared to the absolute real time only.
In this case, you use server
definitions (the grandmaster):
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
...
Also setup this computer as an NTP server, say local.ntp
2) Setup the other computers as peers
server local.ntp # only on a few other (3 to 5) computers
peer c0 iburst
peer c1 iburst
peer c2 iburst
You do not need to have all 48 computers connected to each others, instead you would have between 3 and 5 with each computer using a slightly different setup (c1, c2, c3, then c2, c3, c4, etc.) As a result you get a peer to peer network which synchronizes each others as closely as possible, with a few computers (3 to 5) linking to the node defined in (1), i.e. local.ntp
, to get the time as close as possible to real time.
The local.ntp
reference can itself be viewed as a peer (you may even be able to make it a peer?)
P.S. the use of restrict
is strongly advised when using peer
on a semi-public network to prevent others from accessing your NTP network.