0

I have the following data

  date         price  strike
"01apr2010"    1086.2  50
"01apr2010" 1048.6001 100
"01apr2010"    1060.8 100
"01apr2010"    1014.1 100
"01apr2010"   1037.45 100
"01apr2010"     988.4 150
"01apr2010"    919.05 200
"01apr2010"     949.5 200
"01apr2010"     961.1 200
"01apr2010"     938.9 200
"01apr2010"       967 200
"01apr2010"     972.8 200
"01apr2010"    924.75 225
"01apr2010"     914.4 225
"01apr2010"     911.2 250

I want to generate a variable containing the mean of price by date and (strike == (strike+50))

e.g. for the first row (strike=50), this would be the mean of price for date="01apr2010" and strike=100 (1048+1060+...+1037)/4 . for row 2-5 (strike=100), this would be 988.4 (price in row 6). for row 6 (strike=150) this would be (919.05+...+972.8)/6.

The mean by date and strike is just egen mean(price), by(date strike), but i need a variable containing the mean for observations with strike equal to (strike+50).

This could also, preferably, instead of the mean by strike==strike+50, be strike + next increment in strike.


For those wondering or interested, I need this in order to calculate the empirical probability density function of call option prices on SP500, which can be approximated as

for options with prices c1,c2,c3 
with strike prices K1=K2-d < K2 < K3=K2+d.
the risk-neutral probability density function of the underlying asset being equal to K2 is
g(S_t = K2) = (c1+c3-2*c2)/d^2

See Hull (2018) "Options, Futures and other Derivatives", Appendix A to chp. 17.

jthg
  • 123
  • 7
  • The title wouldn't give away the game, but people interested in this thread may be interested in https://stackoverflow.com/questions/9577808/using-if-qualifier-with-egen-in-stata – Nick Cox Mar 11 '18 at 16:29

1 Answers1

0

Look on Statalist for hundreds of mentions of rangestat (SSC), as for example,

clear 
input str9 date         price  strike
"01apr2010"    1086.2  50
"01apr2010" 1048.6001 100
"01apr2010"    1060.8 100
"01apr2010"    1014.1 100
"01apr2010"   1037.45 100
"01apr2010"     988.4 150
"01apr2010"    919.05 200
"01apr2010"     949.5 200
"01apr2010"     961.1 200
"01apr2010"     938.9 200
"01apr2010"       967 200
"01apr2010"     972.8 200
"01apr2010"    924.75 225
"01apr2010"     914.4 225
"01apr2010"     911.2 250
end 

rangestat mean1=price, interval(strike 50 50) by(date) 
bysort date (strike) : gen group = sum(strike != strike[_n-1])  
rangestat mean2=price, interval(group 1 1) by(date) 

list, sepby(strike) 

     +--------------------------------------------------------------+
     |      date     price   strike       mean1   group       mean2 |
     |--------------------------------------------------------------|
  1. | 01apr2010    1086.2       50   1040.2375       1   1040.2375 |
     |--------------------------------------------------------------|
  2. | 01apr2010    1048.6      100   988.40002       2   988.40002 |
  3. | 01apr2010    1060.8      100   988.40002       2   988.40002 |
  4. | 01apr2010    1014.1      100   988.40002       2   988.40002 |
  5. | 01apr2010   1037.45      100   988.40002       2   988.40002 |
     |--------------------------------------------------------------|
  6. | 01apr2010     988.4      150   951.39166       3   951.39166 |
     |--------------------------------------------------------------|
  7. | 01apr2010    919.05      200   911.20001       4   919.57501 |
  8. | 01apr2010     949.5      200   911.20001       4   919.57501 |
  9. | 01apr2010     961.1      200   911.20001       4   919.57501 |
 10. | 01apr2010     938.9      200   911.20001       4   919.57501 |
 11. | 01apr2010       967      200   911.20001       4   919.57501 |
 12. | 01apr2010     972.8      200   911.20001       4   919.57501 |
     |--------------------------------------------------------------|
 13. | 01apr2010    924.75      225           .       5   911.20001 |
 14. | 01apr2010     914.4      225           .       5   911.20001 |
     |--------------------------------------------------------------|
 15. | 01apr2010     911.2      250           .       6           . |
     +--------------------------------------------------------------+

Trivial discrepancies can be seen here because by default your numeric examples are read in as float while rangestat produces double variables.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47