0

I have the following data in Stata:

input drug  halflife    hl_weight
3       2.95        0.0066
2       6.00        0.0004
5       13.60       0.0006
1       2.82        0.0331
4       8.80        0.0001
4       1.24        0.0075
2       6.25        0.1123
4       17.20       0.0002
5       14.50       0.0020
4       5.50        0.0016
5       13.30       0.0003
4       8.26        0.0201
4       16.50       0.0103
4       11.40       0.0016
4       5.90        0.0005
4       3.99        0.0100
4       2.80        0.0073
4       3.00        0.0133
4       3.17        0.0061
4       4.95        0.1404
end

I am trying to create boxplots of drug halflives using the command below:

graph box halflife [aweight=hl_weight], over(drug)

When I include the weight option, some of the resulting box plots consist of multiple dots instead of the typical interquartile range and median:

Here is a picture demonstrating the difference in boxplots of Weighted vs Unweighted data.

Why does this happen and how can I fix it?

  • For some reason, my titles and axes were cut off in the link. Left: weighted data; Right: unweighted data. X axis for both plots reads: 1 2 3 4 5. – H. Meredith Jun 27 '18 at 15:14
  • Ok- thank you. Your point about the weighted values falling outside of the IQR makes sense. – H. Meredith Jun 27 '18 at 15:38
  • Thanks! I am still thinking this through. Shouldn't the IQR be recalculated with the new weighted values (and a new box subsequently drawn)? – H. Meredith Jun 27 '18 at 16:04
  • 2
    It is recalculated, hence the change displayed. But a box cannot be drawn on this graph as the range of values is very narrow or there are not enough observations. –  Jun 27 '18 at 17:17

1 Answers1

1

Obviously, this happens because of the weighting. The weights give more emphasis to values that are well outside the interquartile range.

I do not think there is anything to fix here. You could try to use the nooutsides option of the graph box command to hide the dots but i would not recommend it.