Mean and Standard deviation of interpolated datasets (R)

Question

I have 8 data sets that I have interpolated so the x axis spacing is the same but they are different lengths ranging from 800-1200 points. What I would like to do is then calculate the mean of each y value and the standard deviation. Ideally there would be an output of 1200 points with the mean and standard deviation in separate columns and from there I can plot a graph of the mean y value and add error bars. I can't give the data itself but this is the setup. Any help or pointers in the right direction would be greatly appreciated!

#s1 k = 0.4 800 data points
S1dat <- data.frame(x=S[[1]][["K"]], S[[1]][["ShearStress"]])
S1datapprox <- data.frame(approx(S1dat$x, S1dat$y, n = 800))
#s2 k = 0.6 1200 data points
S2dat <- data.frame(x=S[[2]][["K"]], S[[2]][["ShearStress"]])
S2datapprox <- data.frame(approx(S2dat$x, S2dat$y, n = 1200))
#s3 k = 0.34 680 data points
S3dat <- data.frame(x=S[[3]][["K"]], S[[3]][["ShearStress"]])
S3datapprox <- data.frame(approx(S3dat$x, S3dat$y, n = 800))
#s4 k = 0.5 1000 data points
S4dat <- data.frame(x=S[[4]][["K"]], S[[4]][["ShearStress"]])
S4datapprox <- data.frame(approx(S4dat$x, S4dat$y, n = 1000))
#s5 k = 0.4 800 data points
S5dat <- data.frame(x=S[[5]][["K"]], S[[5]][["ShearStress"]])
S5datapprox <- data.frame(approx(S5dat$x, S5dat$y, n = 800))
#s6 k = 0.36 720 data points
S6dat <- data.frame(x=S[[6]][["K"]], S[[6]][["ShearStress"]])
S6datapprox <- data.frame(approx(S6dat$x, S6dat$y, n = 720))
#s7 k = 0.4 800 data points
S7dat <- data.frame(x=S[[7]][["K"]], S[[7]][["ShearStress"]])
S7datapprox <- data.frame(approx(S7dat$x, S7dat$y, n = 800))
#s8 k = 0.54 1080 data points
S8dat <- data.frame(x=S[[8]][["K"]], S[[8]][["ShearStress"]])
S8datapprox <- data.frame(approx(S8dat$x, S8dat$y, n = 1080))

str(S)
List of 8
 $ :'data.frame':   805 obs. of  3 variables:
  ..$ K           : num [1:805] 0 0.000498 0.000996 0.001494 0.001992 ...
  ..$ ShearStress : num [1:805] 0 178 356 578 841 ...
  ..$ NormalStress: num [1:805] 0 -1.77 -5.35 -7.14 -11 -15 -16.7 -20.4 -22 -23.6 ...
 $ :'data.frame':   1500 obs. of  3 variables:
  ..$ K           : num [1:1500] 0 0.0004 0.000801 0.001201 0.001602 ...
  ..$ ShearStress : num [1:1500] 0 23.6 38.3 43.7 68.3 ...
  ..$ NormalStress: num [1:1500] 0.1 -1.34 -2.49 -4.04 -5.7 -7.28 -9.08 -10.7 -12.5 -14.3 ...
 $ :'data.frame':   812 obs. of  3 variables:
  ..$ K           : num [1:812] 0 0.000419 0.000838 0.001257 0.001676 ...
  ..$ ShearStress : num [1:812] 0 243 547 973 1280 ...
  ..$ NormalStress: num [1:812] 0 -0.89 -3.63 -6.05 -8.7 -11.5 -14.1 -16.9 -19.3 -22.1 ...
 $ :'data.frame':   853 obs. of  3 variables:
  ..$ K           : num [1:853] 0 0.000587 0.001174 0.00176 0.002347 ...
  ..$ ShearStress : num [1:853] 0 246 480 756 1246 ...
  ..$ NormalStress: num [1:853] 0 1 3 3 4 ...
 $ :'data.frame':   916 obs. of  3 variables:
  ..$ K           : num [1:916] 0 0.000437 0.000874 0.001312 0.001749 ...
  ..$ ShearStress : num [1:916] 0 44.4 67.1 89.2 119.1 ...
  ..$ NormalStress: num [1:916] 0 -0.01 -2.08 -5.06 -7.06 -9.4 -11.9 -14.6 -17.2 -20.1 ...
 $ :'data.frame':   329 obs. of  3 variables:
  ..$ K           : num [1:329] 0 0.000213 0.000536 0.001105 0.001871 ...
  ..$ ShearStress : num [1:329] 0 52.7 174.7 415.4 740.5 ...
  ..$ NormalStress: num [1:329] 0 -29.1 -30.4 -31.8 -33 -34.2 -35.3 -36.4 -38.3 -39.8 ...
 $ :'data.frame':   790 obs. of  3 variables:
  ..$ K           : num [1:790] 0 0.000237 0.000745 0.001252 0.00176 ...
  ..$ ShearStress : num [1:790] 0 94.8 347.7 633.8 1215.9 ...
  ..$ NormalStress: num [1:790] 0 -6 -12 -17 -28.4 ...
 $ :'data.frame':   1060 obs. of  3 variables:
  ..$ K           : num [1:1060] 0 0.00051 0.00102 0.00153 0.00204 ...
  ..$ ShearStress : num [1:1060] 0 44.2 70.4 100.3 133.3 ...
  ..$ NormalStress: num [1:1060] 0 0.1 0.18 -0.2 -1.2 ... ````

Do you have a single `list` on which you have to do this? – akrun Mar 22 '22 at 15:36 — akrun, Mar 22 '22 at 15:36

akrun · Accepted Answer · 2022-03-22T16:27:17.403

2

Instead of doing this separately and creating multiple objects, we may loop over the list 'S' with lapply

do.call(rbind, lapply(S, function(x) {
        x1 <- approx(x[["K"]], x[["ShearStress"]], n = nrow(x))$y
        data.frame(Mean = mean(x1, na.rm = TRUE), SD = sd(x1, na.rm = TRUE))
   }))

If we need elementwise mean/sd, then we need to change the length of the output to the max length of the list and then rowMeans/rowSds

lst1 <- lapply(S, function(x) {
        approx(x[["K"]], x[["ShearStress"]], n = nrow(x))$y
    })
mx <- max(lengths(lst1))
m1 <- as.matrix(do.call(cbind, lapply(lst1, `length<-`, mx)))
library(matrixStats)
out <- data.frame(Mean = rowMeans(m1, na.rm = TRUE),
                  SD = rowSds(m1, na.rm = TRUE))

edited Mar 22 '22 at 16:27

answered Mar 22 '22 at 15:39

akrun

874,273
37
540
662

Thank you for the quick reply! I'm getting an error "unexpected token /(x)" but that also might be my setup. What does the backslash do? – James Blackwell Mar 22 '22 at 15:48
do.call(rbind, lapply(S, funcion(x) { Error: unexpected '{' in "do.call(rbind, lapply(S, funcion(x) {" > x1 <- approx(x[["K"]], x[["ShearStress"]], n = nrow(x)) Error in xy.coords(x, y, setLab = FALSE) : object 'x' not found > data.frame(Mean = mean(x1, na.rm = TRUE), SD = sd(x1, na.rm = TRUE)) Error in mean(x1, na.rm = TRUE) : object 'x1' not found > })) Error: unexpected '}' in "}" – James Blackwell Mar 22 '22 at 15:50
it was created when I took the values from Excel. S[[i]] = exvar(sample[[i]]) exvar is the function I used to get the variables. Not sure if it's strictly a list? – James Blackwell Mar 22 '22 at 15:52
I think take my new values now and make a new document/list and I'll try apply your code – James Blackwell Mar 22 '22 at 15:57
have edited now – James Blackwell Mar 22 '22 at 16:00
1

No errors! thanks a million for your help. Just needs some small tweaking which I'll work on. Will mark as answered (well still need to get the mean and std deviation of each point actually) – James Blackwell Mar 22 '22 at 16:12
Thank you! Getting this Error in rowMeans(m1, na.rm = TRUE) : 'x' must be numeric – James Blackwell Mar 22 '22 at 16:18
It is numeric, not sure where the mixup is > str(m1) List of 16 $ : num [1:805] 0 0.000498 0.000995 0.001493 0.00199 ... $ : num [1:805] 0 178 356 577 840 ... $ : num [1:1500] 0 0.0004 0.000801 0.001201 0.001601 ... $ : num [1:1500] 0 23.6 38.3 43.7 68.3 ... $ : num [1:812] 0 0.000419 0.000838 0.001258 0.001677 ... $ : num [1:812] 0 243 547 974 1280 ... $ : num [1:853] 0 0.000587 0.001174 0.001761 0.002347 ... $ : num [1:853] 0 246 480 756 1246 ... – James Blackwell Mar 22 '22 at 16:22
$ : num [1:916] 0 0.000437 0.000874 0.001311 0.001749 ... $ : num [1:916] 0 44.4 67.1 89.2 119.1 ... $ : num [1:329] 0 0.0011 0.0022 0.00329 0.00439 ... $ : num [1:329] 0 412 670 563 663 ... $ : num [1:790] 0 0.000507 0.001014 0.001521 0.002028 ... $ : num [1:790] 0 229 499 942 1524 ... $ : num [1:1060] 0 0.00051 0.00102 0.00153 0.00204 ... $ : num [1:1060] 0 44.2 70.4 100.3 133.3 ... - attr(*, "dim")= int [1:2] 2 8 - attr(*, "dimnames")=List of 2 ..$ : chr [1:2] "x" "y" ..$ : NULL – James Blackwell Mar 22 '22 at 16:23
@JamesBlackwell sorry, got it. Your `approx` is a `list` of 'x' and 'y'. Which one do you want to extract to return the mean, sd – akrun Mar 22 '22 at 16:26
All good, thanks again. The Y value (shear stress) – James Blackwell Mar 22 '22 at 16:26
@JamesBlackwell i updated with `$y` to extract the 'y' component – akrun Mar 22 '22 at 16:27

Mean and Standard deviation of interpolated datasets (R)

1 Answers1