How to get count, mean and sd for this type of data?

Question

A pair of my sample data is as follows, but I have more than these columns

df<-read.table(text=" ID_k2g6   ko_k2g6 jaz_k2g6    ID_k5n100   ko_k5n100   jaz_k5n100
12  60  A   15  10  A
14  40  B   15  40  A
13  100 A   65  60  B
10  20  B   35  40  B
NA  NA  NA  80  20  B
",header=TRUE)

Here is the intended outcome

  

  last  type    class   noA MjazA   SDJazA  noB MjazB   SDJazB
    6   k2  g   2   12.5    0.7 2   12  2.84
    100 k5  n   2   15  0   3   40  20

As you can see, there is a pair of data. I want to get counts, mean and SD for each. last= the last digit (5 and 100), type = after the hyphen (k2 and k5) and class is after k (g and n). count, mean and SD get from ko.

I have used these codes, but it does not help me

df$id<-1:nrow(df)
setDT(df)
dat<-melt(df,id=c("id", "ko_k2g6","ko_k5n100"))
dat[,.(mean1=mean(ko_k2g6),sd1=sd(ko_k2g6),
mean2=mean(ko_k5n100),sd1=sd(ko_k5n100)),.(varaiable,value)

Get counts of what? The last digit of what? What hyphen? Can you post a reproducible example? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Bill O'Brien, Jul 09 '21 at 18:13
It's not clear to me at all how you get from your sample data to your intended outcome — Phil, Jul 09 '21 at 21:19
Thanks, Phil. If you ignore IDs, I want to get count, mean and sd for A and B within each data set and then create column names for them. OK? — Rose, Jul 09 '21 at 21:52
Your intended outcome doen't make sense. Are you sure that the mean and SD get from ko? It's more probable from `ID`. — Peace Wang, Jul 10 '21 at 04:29

Peace Wang · Accepted Answer · 2021-07-10T04:43:40.887

Here is one possible solution. All mean,sd are from ID. You can change it by ko.

library(data.table)
df[,':='(id=1:.N,
         name_k2g6="k2_g_6",
         name_k5n100="k5_n_100")]

dfl <- melt(df,
            id="id",
            measure.vars = patterns("^ID_","^ko_","^jaz_","^name_"),
            value.name = c("ID","ko","jaz","name"))

dfl[!is.na(jaz),.(number=.N,
                  Mjaz=mean(ID,na.rm=T),
                  SDjaz=sd(ID,na.rm = T)),
    by=.(name,jaz)]
#     name jaz number Mjaz      SDjaz
#1:   k2_g_6   A      2 12.5  0.7071068
#2:   k2_g_6   B      2 12.0  2.8284271
#3: k5_n_100   A      2 15.0  0.0000000
#4: k5_n_100   B      3 60.0 22.9128785

You can also split the column name into yous type, class and last with

dt[,c("type","class","last"):=tstrsplit(name,"_")]

How to get count, mean and sd for this type of data?

1 Answers1