Standard Deviation function in ddplyr not returning a value on melted dataframe

Question

I have a dataset that is composed of 3 position values (XYZ) and 3 rotation values (Omega, Phi, Kappa).

head(pos.df) looks like this

  Batch  PhotoID         X          Y        Z       Omega         Phi      Kappa
1     1 DSC_7120 -269.6995 -359.33126 2390.522 -2.78643779  0.03288689   49.42041
2     1 DSC_7121 -323.5350 -311.80727 2388.374 -1.43015984 -0.61313717   49.08223
3     1 DSC_7122 -381.0833 -259.52629 2386.173 -0.08466679 -2.05867638   48.67501
4     1 DSC_7123 -434.4999 -212.15629 2384.075 -0.23728698 -1.97925763   49.09743
5     1 DSC_7707 -297.2458  -12.70537 2352.626 -1.17187585  0.70767493 -130.93919
6     1 DSC_7708 -238.0820  -61.07186 2353.831 -1.12715649  0.55772261 -131.25967

I then melt the data

dfl <- melt(pos.df, id.vars = c("Batch", "PhotoID"))

such that head(dfl)

Batch  PhotoID variable     value
1     1 DSC_7120        X -269.6995
2     1 DSC_7121        X -323.5350
3     1 DSC_7122        X -381.0833
4     1 DSC_7123        X -434.4999
5     1 DSC_7707        X -297.2458
6     1 DSC_7708        X -238.0820

and tail(dfl)

Batch  PhotoID variable      value
385     5 DSC_7710    Kappa -131.57589
386     5 DSC_7711    Kappa -131.54491
387     5 DSC_7794    Kappa   51.35246
388     5 DSC_7795    Kappa   51.58456
389     5 DSC_7796    Kappa   51.82275
390     5 DSC_7797    Kappa   51.48262

now I would like to look at some summary statistics...

smry <- ddply(dfl, c("Batch", "PhotoID", "variable"), 
              summarise, 
              mean = mean(value), 
              sd = sd(value),
              se = sd(value)/sqrt(length(value)))

but for some reason the SD and SE values are returning NA.

head(smry)

Batch  PhotoID variable          mean sd se
1      1 DSC_7120        X -269.69945440 NA NA
2      1 DSC_7120        Y -359.33125720 NA NA
3      1 DSC_7120        Z 2390.52165300 NA NA
4      1 DSC_7120    Omega   -2.78643779 NA NA
5      1 DSC_7120      Phi    0.03288689 NA NA
6      1 DSC_7120    Kappa   49.42040741 NA NA
7      1 DSC_7121        X -323.53499700 NA NA
8      1 DSC_7121        Y -311.80726930 NA NA
9      1 DSC_7121        Z 2388.37389700 NA NA
10     1 DSC_7121    Omega   -1.43015984 NA NA

I have checked the data type...

str(pos.df)

'data.frame':   65 obs. of  8 variables:
 $ Batch  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ PhotoID: Factor w/ 13 levels "DSC_7120","DSC_7121",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ X      : num  -270 -324 -381 -434 -297 ...
 $ Y      : num  -359.3 -311.8 -259.5 -212.2 -12.7 ...
 $ Z      : num  2391 2388 2386 2384 2353 ...
 $ Omega  : num  -2.7864 -1.4302 -0.0847 -0.2373 -1.1719 ...
 $ Phi    : num  0.0329 -0.6131 -2.0587 -1.9793 0.7077 ...
 $ Kappa  : num  49.4 49.1 48.7 49.1 -130.9 ...

Can anyone tell me why my sd() and se functions are not returning values?

as an example, I calculated these numbers for a single photo in excel,

 stat, X, Y, Z, Omega, Phi, Kappa
Variance, 0.02273259300, 0.13331103000, 0.00000342846, 0.00000214810, 0.00000364895, 0.00000310653
SD, 0.13485575300, 0.32657131600, 0.00165613000, 0.00131090800, 0.00170855500, 0.00157646000

so technically they do exist...

Thank you for your time.

When you `melt`, you have unique groups `PhotoID & variable` and only a single value in that group. Notice that your `mean()` is the same value as that in the melted data. `-269.69945440` for `PhotoID==DSC_7120` and `variable==X`. `sd()` of a single value is `NA`. I think you should discard `PhotoID` and use the following code: `smry <- df1 %>% group_by(variable) %>% summarise(mean=mean(value), sd=sd(value), se=sd(value)/sqrt(length(value)))` — CPak, Jul 14 '17 at 16:32
@ChiPak, Thank you but I think maybe that is not the case? I updated the question to show that the PhotoID is not unique, there is 5 sets of the 6 variables for each PhotoID, these are the sets for which I want the summary stats for... e.g., variance in X position of this photo after each of the 5 reconstructions (of position) — c0ba1t, Jul 14 '17 at 16:40
Could you point to a `PhotoID` & `variable` **pair** that has more than 1 `value`? I only see unique **pairs** — CPak, Jul 14 '17 at 16:42
`Photo Id` is not unique for sure but combination of group variable `c("Batch", "PhotoID", "variable")` is unique , which will not give you `std`, std require more than one value , if not will return `NA` — BENY, Jul 14 '17 at 16:46
you are correct.. Batch needs to be dropped... let me try this. Thank you both — c0ba1t, Jul 14 '17 at 16:49

c0ba1t · Answer 1 · 2017-07-14T18:44:16.547

Thanks to @ChiPak and @Wen

I was over-constraining my summarise function...

'Batch' needed to be removed from the call... like so

smry <- ddply(dfl, c("PhotoID", "variable"), 
              summarise, 
              mean = mean(value), 
              sd = sd(value),
              se = sd(value)/sqrt(length(value)))

now,

head(smry)

PhotoID variable          mean           sd           se
1  DSC_7120        X -269.69730716 0.1507733086 0.0674278735
2  DSC_7120        Y -359.60802888 0.3651178278 0.1632856566
3  DSC_7120        Z 2390.51990620 0.0018517456 0.0008281258
4  DSC_7120    Omega   -2.78508610 0.0014656399 0.0006554541
5  DSC_7120      Phi    0.03468442 0.0019102228 0.0008542776
6  DSC_7120    Kappa   49.42263779 0.0017625356 0.0007882299
7  DSC_7121        X -323.53707466 0.1508844825 0.0674775919
8  DSC_7121        Y -312.08052414 0.3633875558 0.1625118554
9  DSC_7121        Z 2388.37413460 0.0005815413 0.0002600732
10 DSC_7121    Omega   -1.42917428 0.0016912203 0.0007563367

Thank you both.

Standard Deviation function in ddplyr not returning a value on melted dataframe

1 Answers1