Cluster Data in R

Question

I'm very new to R and I would like help clustering and analyzing my data. I have a dataset with many columns and data points. The dataframe looks something like this:

V1	V2	V3
G. Cole	53.1	.1.
C. Kershaw	56.8	.3
G. Cole	53.5	.2
N. Ryan	54.6	.5

The analysis I would like to run is to find the standard deviation of V2 per name V1. How do I do this so that I get the standard deviation for each person in V1 for their individual V2 dataset. For example, what is G. Cole's V2 standard deviation? I have thousands of names in V1 and their subsequent V2 and V3 values and I would like to find each of their SD of V2 and order them from highest to lowest. What is the simple code that I would run in order to do this?

Thanks

bird · Answer 1 · 2021-07-17T21:31:06.367

0

Using dplyr:

library(dplyr)
df %>% 
        group_by(V1) %>% 
        summarise(std = sd(V2)) %>% 
        arrange(desc(std))

Output:

  V1            std
  <chr>       <dbl>
1 G. Cole     0.283
2 C. Kershaw NA    
3 N. Ryan    NA

Note: You get NA for the names except G. Cole because this is the only name in your specific example that have more than one examples. But it will work in your larger data assuming that there will be multiple observations for each name.

edited Jul 17 '21 at 21:31

answered Jul 17 '21 at 21:25

bird

2,938
1
6
27

Hi. Thank you! This works great. Now, there are two columns I'd like to add. First, how many times G. Cole and C. Kershaw .... came up. The frequency in which each V1 come up. Next, I'd like to add the mean of V3 per V8. How do I add all of this in the same table? Thanks! – EamonS Jul 18 '21 at 23:07

Cluster Data in R

1 Answers1