1

I have been working with a dataset (called CWNA_clim_vars) structured so that the variables associated with each datapoint within the set are arranged in columns, like this:

dbsid elevation Tmax04  Tmax10  Tmin04  Tmin10  PPT04   PPT10
 0001      1197    8.1     8.9    -5.2    -3.5     34      95
 0002      1110    7.7       8    -2.9    -0.6    114     375
 0003      1466    5.4     6.4    -4.7    -1.5    199     453
 0004      1267    6.1     7.1    -3.6    -0.7    166     376
  ...       ...    ...     ...     ...     ...    ...     ...
 1000       926    7.2    10.1    -0.8     2.7    245     351

I've been attempting to on each column run boxplot stats, retrieve the values of the outliers within each column, and write them to a new data frame, called summary_stats. The code I set up in attempt to achieve this is as follows:

summary_stats <- data.frame()
for (i in names(CWNA_clim_vars)){
  temp <- boxplot.stats(CWNA_clim_vars[,i])
  out <- as.list(temp$out)
  for (j in out) {
    summary_stats[i,j] <- out[j]
  }
}

Unfortunately, in running this, the following error message is thrown:

Error in `[<-.data.frame`(`*tmp*`, i, j, value = list(6.65)) : 
  new columns would leave holes after existing columns

I am guessing that it is because the number of outliers varies between columns that this error message is being thrown, as if instead I replace temp$out with temp$n, which contains one number only per column, produced is a data frame having these numbers arranged in a single column.

Is there a way of easily remedying this so that I end up with a data frame having rows which are not necessarily of the same length? Thanks for considering my question - any help I would appreciate greatly.

T. Zaborniak
  • 107
  • 1
  • 11

1 Answers1

3

You'd better use a "list".

out_lst <- lapply(CWNA_clim_vars, function (x) boxplot.stats(x)$out)

If for some reason you have to present it in a "data frame", you need padding.

N <- max(lengths(out_lst))
out_df <- data.frame(lapply(out_lst, function (x) c(x, rep(NA, N - length(x)))))

Try with a tiny example:

CWNA_clim_vars <- data.frame(a = c(rep(1,9), 10), b = c(10,11,rep(1,8)))
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248