-1

I have a set of simulated data that are roughly uniformly distributed. I would like to sample a subset of these data and for that subset to have a log-normal distribution with a (log)mean and (log)standard deviation that I specify.

I can figure out some slow brute-force ways to do this, but I feel like there should be a way to do it in a couple lines using the plnorm function and the sample function with the "prob" variable set. I can't seem to get the behavior I'm looking for though. My first attempt was something like:

probs <- plnorm(orig_data, meanlog = mu, sdlog = sigma)
new_data <- sample(orig_data, replace = FALSE, prob = probs)

I think I'm misinterpreting the way the plnorm function behaves. Thanks in advance.

MDeeps
  • 1
  • 1
  • 1
    Try `qlnorm` not `plnorm`. When mapping from the uniform to another distribution, you should just use the inverse cdf or quantile function. All of the R distribution functions use the same prefixes (q,p, etc.) to identify which is which. – Frank Feb 19 '15 at 15:58

2 Answers2

1

If your orig_data are uniformly distributed between 0 and 1, then

new_data = qlnorm(orig_data, meanlog = mu, sdlog = sigma)

will give log sampled data. IF your data aren't between 0 and 1 but say a and b then first:

orig_data = (orig_data-a)/(b-a)

Generally speaking, uniform RV between 0 and 1 are seen as probability so if you want to sample from a given distribution with it, you have to use q... ie take the corresponding quantile

ClementWalter
  • 4,814
  • 1
  • 32
  • 54
0

Thanks guys for the suggestions. While they get me close, I've decided on a slightly different approach for my particular problem, which I'm posting as the solution in case it's useful to others.

One specific I left out of the original question is that I have a whole data set (stored as a data frame), and I want to resample rows from that set such that one of the variables (columns) is log-normally distributed. Here is the function I wrote to accomplish this, which relies on dlnorm to calculate probabilities and sample to resample the data frame:

resample_lognorm <- function(origdataframe,origvals,meanlog,sdlog,n) {
  prob <- dlnorm(origvals,meanlog=log(10)*meanlog,sdlog=log(10)*sdlog)
  newsamp <- origdataframe[sample(nrow(origdataframe),
                                  size=n,replace=FALSE,prob=prob),]
  return(newsamp)
}

In this case origdataframe is the full data frame I want to sample from, and originals is the column of data I want to resample to a log-normal distribution. Note that the log(10) factors in meanlog and sdlog are because I want the distribution to be log-normal in base 10, not natural log.

MDeeps
  • 1
  • 1