Why are simulated stock returns re-scaled and re-centered in the “pbo” vignette in the pbo (probability of backtest overfitting) package in R?

Question

Here's the relevant code from the vignette, altered slightly to fit it on the page here, and make it easy to reproduce. Code for visualizations omitted. Comments are from vignette author.

(Full vignette: https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html)

library(pbo)

#First, we assemble the trials into an NxT matrix where each column 
#represents a trial and each trial has the same length T. This example 
#is random data so the backtest should be overfit.`

set.seed(765)
n <- 100
t <- 2400
m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
                       dimnames=list(1:t,1:n)), check.names=FALSE)

sr_base <- 0
mu_base <- sr_base/(252.0)
sigma_base <- 1.00/(252.0)**0.5

for ( i in 1:n ) {
  m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale 
  m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center
}

#We can use any performance evaluation function that can work with the 
#reassembled sub-matrices during the cross validation iterations. 
#Following the original paper we can use the Sharpe ratio as

sharpe <- function(x,rf=0.03/252) {
  sr <- apply(x,2,function(col) {
    er = col - rf
    return(mean(er)/sd(er))
  })
  return(sr)
}

#Now that we have the trials matrix we can pass it to the pbo function  
#for analysis.

my_pbo <- pbo(m,s=8,f=sharpe,threshold=0)

summary(my_pbo)

Here's the portion i'm curious about:

sr_base <- 0
mu_base <- sr_base/(252.0)
sigma_base <- 1.00/(252.0)**0.5

for ( i in 1:n ) {
  m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale 
  m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center
}

Why is the data transformed within the for loop, and does this kind of re-scaling and re-centering need to be done with real returns? Or is this just something the author is doing to make his simulated returns look more like the real thing?

Googling and searching through stackoverflow turned up some articles and posts regarding scaling volatility to the square root of time, but this doesn't look quite like what I've seen. Usually they involve multiplying some short term (i.e. daily) measure of volatility by the root of time, but this isn't quite that. Also, the documentation for the package doesn't include this chunk of re-scaling and re-centering code. Documentation: https://cran.r-project.org/web/packages/pbo/pbo.pdf

So:

Why is the data transformed in this way/what is result of this transformation?
Is it only necessary for this simulated data, or do I need to
similarly transform real returns?

score 0 · Accepted Answer · answered Nov 21 '17 at 21:24

I posted this question on the r-help mailing list and got the following answer:

"Hi Joe, The centering and re-scaling is done for the purposes of his example, and also to be consistent with his definition of the sharpe function. In particular, note that the sharpe function has the rf (riskfree) parameter with a default value of .03/252 i.e. an ANNUAL 3% rate converted to a DAILY rate, expressed in decimal. That means that the other argument to this function, x, should be DAILY returns, expressed in decimal.

Suppose he wanted to create random data from a distribution of returns with ANNUAL mean MU_A and ANNUAL std deviation SIGMA_A, both stated in decimal. The equivalent DAILY returns would have mean MU_D = MU_A / 252 and standard deviation SIGMA_D = SIGMA_A/SQRT(252).

He calls MU_D by the name mu_base and SIGMA_D by the name sigma_base.

His loop now converts the random numbers in his matrix so that each column has mean MU_D and std deviation SIGMA_D.

HTH, Eric"

I followed up with this:

"If I'm understanding correctly, if I’m wanting to use actual returns from backtests rather than simulated returns, I would need to make sure my risk-adjusted return measure, sharpe ratio in this case, matches up in scale with my returns (i.e. daily returns with daily sharpe, monthly with monthly, etc). And I wouldn’t need to transform returns like the simulated returns are in the vignette, as the real returns are going to have whatever properties they have (meaning they will have whatever average and std dev they happen to have). Is that correct?"

I was told this was correct.

I emailed the package author and he confirmed that this explanation is accurate — Joe O, Nov 23 '17 at 04:07

Why are simulated stock returns re-scaled and re-centered in the “pbo” vignette in the pbo (probability of backtest overfitting) package in R?

1 Answers1