This story is sourced to Tyler Blake, professor CSU Northridge.
The story is printed in Advances in Human-computer Interaction (1995) at page 94, and the reference for the story is given as
Blake, T. (1985). Introduction to Principles and Techniques for Interface Design Tutorial Notes for CHI'85 tutorial.
This corresponds to a tutorial by Tyler Blake from 9AM - 12:30 PM Monday 15 April 1985 as part of the Conference on Human Factors in Computing Systems, Hyatt Regency at Embarcadero Center, San Francisco. Source.
see Introduction to Principles and Techniques for Interface Design
The story is also in TOG on Interface (1992)
The older version of the story is somewhat different from the one in OP.
The original version does not include "All were roughly the same age and within narrow height and weight limits" nor "By item seven all were gone". Also, the earlier version says "shoes, pants, and shirts" instead of the "underwear" mentioned in the OP.
It makes clear that anyone not within one standard deviation of average was excluded each round.
Specifically, the original (or at least 1992) version of the story is:
Several years ago, the Air Force carried out a little test to find out how many cadets could fit into what were statistically the average-sized clothes. They assembled 680 cadets in a courtyard and slowly called off the average sizes - plus or minus one standard deviation - of various items such as shoes, pants and shirts. Any cadet that was not in the average range for a given item was asked to leave the courtyard. By the time they finished with the fifth item, there were only 2 cadets left; by the sixth, all but one had been eliminated.
Comparing the story to statistical expectations for normal distributions of independent variables:
68.2 % are within one standard deviation.
0.682^5 = 0.148 (101 / 680)
0.682^6 = 0.101 (69 / 680)
0.682^7 = 0.069 (47 / 680)
So even after 7 round, one would expect 47 cadets to be remaining, and even more given sizes of various clothing items are correlated. The story doesn't seem credible.
On the other hand, there is a study involving 680 cadets discussed in the 1955 article "Physique and success in military flying" American Journal of Physical Anthropology, vol. 13, pages 217-52. The story could have originated from actual measured data, but morphed over time.
Particularly, it seems to be a dramatization of The "Average Man"? (1952) by Gilbert S. Daniels.
The fallacy of the "average man" concept is further illustrated by a
study based on body measurements made on over 4,000 Air Force flying
personnel. From a total of 131 available measurements a smaller group,
all useful in clothing design was selected.
of the original 4063 men,
1055 were of approximately average stature
of the original 1055 men,
302 were of approximately average chest circumference
of the original 302 men,
143 were of approximately average sleeve length
of the original 143 men,
73 were of approximately average crotch height
of the original 73 men,
28 were of approximately average torso circumference
of the original 28 men,
12 were of approximately average hip circumference
of the original 12 men,
6 were of approximately average neck circumference
of the original 6 men,
3 were of approximately average waist circumference
of the original 3 men,
2 were of approximately average thigh circumference
of the original 2 men,
0 were of approximately average crotch length
The article details what was considered average, basically the middle 25-30%.