0

I have the following data set

mydata <- datasets::volcano

install.packages('e1071')
library(e1071)
library(tidyverse) #load required libraries
head(mydata) # quick view of the data.

#Part 1
#Calculating kurtosis and new measure with apply from base package with annon 
#function and using type 2 from e1071 library
kurtosis <- apply(mydata, 2, function(x) kurtosis(x, type = 2))
new_measure <- apply(mydata, 2, function(x) sd(x) / mad(x))

#create a new dataframe with the calculated kurtosis and new measure
base_mydata <- data.frame(kurtosis = kurtosis, new_measure = new_measure)

I do this aspect fine what I now have to do is use dplyr or purrr to do the above calculations and am not sure why this does not work. I simply get a vector or NaN values?

#Part 2
# Calculate kurtosis for each column

kurtosis_value <- mydata %>%
  map_dbl(~ kurtosis(.x))

Any assistance/guidance apprecaited.

I do this aspect fine what I now have to do is use dplyr or purrr to do the above calculations and am not sure why this does not work. I simply get a vector or NaN values? I was expecting returned values with the kurtosis value of each column

#Part 2
# Calculate kurtosis for each column

kurtosis_value <- mydata %>%
  map_dbl(~ kurtosis(.x))

2 Answers2

1

map_dbl() function expects a vector or a list as input. If you pass a matrix to map_dbl(), it will throw NAs. First you need to convert mydata which is of type matrix to data frame. By this format the function automatically convert data frame to list and apply the function:

library(tidyverse)
library(moments)

mydata <- datasets::volcano
kurtosis_value <- map_dbl(as.data.frame(mydata), kurtosis, na.rm=T)

kurtosis_value

     V1       V2       V3       V4       V5       V6       V7       V8       V9      V10      V11      V12      V13      V14      V15 
2.371050 2.514419 2.699051 2.757678 2.784320 2.735230 2.659157 2.593125 2.475620 2.272475 2.181941 2.147706 2.146325 2.121628 2.077791 
     V16      V17      V18      V19      V20      V21      V22      V23      V24      V25      V26      V27      V28      V29      V30 
2.041687 2.038450 2.068429 2.088117 2.091098 2.087650 2.042588 1.973068 1.918383 1.855893 1.788262 1.788161 1.778543 1.771347 1.833231 
     V31      V32      V33      V34      V35      V36      V37      V38      V39      V40      V41      V42      V43      V44      V45 
1.889760 1.948411 2.016484 2.072357 2.128480 2.114815 2.154601 2.105206 2.038636 1.977894 1.950674 1.914163 1.932104 1.963528 2.004136 
     V46      V47      V48      V49      V50      V51      V52      V53      V54      V55      V56      V57      V58      V59      V60 
2.069453 2.125611 2.148218 2.191073 2.251291 2.180624 2.204499 2.290069 2.369687 2.420440 2.417594 2.270683 2.091416 2.174677 2.169017 
     V61 
2.152479
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14
  • This is what I originally did but only get NaN values. The dataset has no NA values, its a common available data set – Ian Kelly Feb 24 '23 at 21:59
  • mydata is of class matrix and should be converted to expected format for map_dbl(). you should get the kurtosis values without NAs. See the code above. – S-SHAAF Feb 26 '23 at 21:43
0

When you just pass matrix to map(), it cycles through every single element, though you'd want it to iterate over matrix columns. Couple of examples of how to achieve this:

library(e1071)
library(purrr)

mydata <- datasets::volcano

map_dbl(1:ncol(mydata), ~ kurtosis(mydata[,.x], type = 2))
#>  [1] -0.5943826 -0.4424202 -0.2467199 -0.1845791 -0.1563397 -0.2083728
#>  [7] -0.2890058 -0.3589954 -0.4835449 -0.6988673 -0.7948276 -0.8311148
#>  ...
#> [61] -0.8260557

mydata %>% 
  array_branch(margin = 2) %>% 
  map_dbl(\(x) kurtosis(x, type = 2))
#>  [1] -0.5943826 -0.4424202 -0.2467199 -0.1845791 -0.1563397 -0.2083728
#>  [7] -0.2890058 -0.3589954 -0.4835449 -0.6988673 -0.7948276 -0.8311148
#>  ...
#> [61] -0.8260557

Created on 2023-02-25 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20