2

Using the Hockney model, transferring time is modeled by t(s) = α + βm, where α is the latency for each message, and β is the transfer time per byte (or reciprocal of network bandwidth).

But from some papers (like this paper), latency and transfer time are functions of message size. With several message sizes, these are neither constant nor linear!

Hockney model parameters

If the Hockney model parameters are functions of message size, how can we predict collective communication time (eg: for broadcast, scatter, ...) with several message sizes?

Example: If the broadcast operation is performed by the Flat Tree algorithm, t(s)=(P-1)(α + βm). Because α and β are functions of message size, we cannot plot its curve by linear line, and we cannot predict operation time without model parameters which correspond to the message size. For instance, we cannot predict the operation time for a message size of 30 bytes if we have not measured model parameters which send and receive 30 byte messages.

NoseKnowsAll
  • 4,593
  • 2
  • 23
  • 44
voxter
  • 853
  • 2
  • 14
  • 30

1 Answers1

0

In Hockney, α and β are properties of the network, independent of the message size. While the mentioned paper clearly states:

We altered Hockney model such that α and β are functions of message size.

I agree it is confusing that they do always simply refer their altered model as Hockney. The chart, in the paper also looks suspiciously as if "Latency" is actually the message transfer time. You might call this Latency as seen from the application. And "Bandwidth" is also the bandwidth as seen from the application. Consider 10^6 bytes / 65 MBytes/s = 1.5 * 10^4 us. I don't see any sense in using these values that both reflect the total message transfer time as additive individual networkparameters for Hockney. Unfortunately the paper does not explain how they actually derived the parameters from their point-to-point MPI benchmark.

It is also noteworthy that the paper almost always simply uses the full term for message transfer time α(ms) + ms · β(ms), except for two cases, where I suspect it might be a missing pair of braces. Then, the whole term could simply be replaced with a p2p message time (message size).

For the model, I would prefer to use either a pure Hockney with constant α and β - or a model that describes the p2p message time as function of message size. In the latter case you question is still relevant:

For instance, we cannot predict the operation time for a message size of 30 bytes if we have not measured model parameters which send and receive 30 byte messages.

Either you have to measure all possible sizes, or you have to apply a fitting model. Incidentally - if you use linear regression, you end up with Hockney again.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • Question 1:I have used Linear regression to predict α and β(by gradient descent),but it seems to be not good because the range of message size is too large ( from 2^1 to 2^30 ) . Where can i find a paper, or a report about using pure Hockney with constant α and β ?? – voxter Feb 26 '16 at 07:30
  • Question 2:What is fitting model?? I have read it in mentioned paper,but i can not find any papers what explain it ??? @Zulan – voxter Feb 26 '16 at 07:34
  • About some MPI Benchmarks( like as osu benchmark ),both Latency and Bandwidth are dependent of the message size.What is different between Osu latency benchmark parameters and Hockney model parameters ???? – voxter Feb 26 '16 at 07:45
  • Q1: I think there are many papers out there, some of which are referenced by the paper you mentioned. Q2: You can also look in the literature for *regression analysis*. Q3: Osu measures *message* latency and bandwidth. As a simplification the *message* latency of a 0-byte message would equal α, and the *message* banwidth for an infinitely large message - or a pipelined chain of messages as osu measures - would be 1/β – Zulan Feb 26 '16 at 20:32
  • :I haven't seen any papers which use pure Hockney in references of the paper i mentioned.(I have checked all references papers) – voxter Mar 02 '16 at 03:02
  • I just randomly picked [10] *Thakur, R., Gropp, W.: Improving the performance of collective operations in MPICH.*: "We assume that the time taken to send a message between any two nodes can be modeled as α + nβ, where α is the latency (or startup time) per message, **independent of message size**, β is the transfer time per byte, and n is the number of bytes transferred." – Zulan Mar 05 '16 at 08:20