-2

A pair of my sample data is as follows, but I have more than these columns

df<-read.table(text=" ID_k2g6   ko_k2g6 jaz_k2g6    ID_k5n100   ko_k5n100   jaz_k5n100
12  60  A   15  10  A
14  40  B   15  40  A
13  100 A   65  60  B
10  20  B   35  40  B
NA  NA  NA  80  20  B
",header=TRUE)

Here is the intended outcome

  

  last  type    class   noA MjazA   SDJazA  noB MjazB   SDJazB
    6   k2  g   2   12.5    0.7 2   12  2.84
    100 k5  n   2   15  0   3   40  20

As you can see, there is a pair of data. I want to get counts, mean and SD for each. last= the last digit (5 and 100), type = after the hyphen (k2 and k5) and class is after k (g and n). count, mean and SD get from ko.

I have used these codes, but it does not help me

df$id<-1:nrow(df)
setDT(df)
dat<-melt(df,id=c("id", "ko_k2g6","ko_k5n100"))
dat[,.(mean1=mean(ko_k2g6),sd1=sd(ko_k2g6),
mean2=mean(ko_k5n100),sd1=sd(ko_k5n100)),.(varaiable,value)
Rose
  • 53
  • 5
  • 2
    Get counts of what? The last digit of what? What hyphen? Can you post a reproducible example? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bill O'Brien Jul 09 '21 at 18:13
  • It's not clear to me at all how you get from your sample data to your intended outcome – Phil Jul 09 '21 at 21:19
  • Thanks, Phil. If you ignore IDs, I want to get count, mean and sd for A and B within each data set and then create column names for them. OK? – Rose Jul 09 '21 at 21:52
  • Your intended outcome doen't make sense. Are you sure that the mean and SD get from ko? It's more probable from `ID`. – Peace Wang Jul 10 '21 at 04:29

1 Answers1

1

Here is one possible solution. All mean,sd are from ID. You can change it by ko.

library(data.table)
df[,':='(id=1:.N,
         name_k2g6="k2_g_6",
         name_k5n100="k5_n_100")]

dfl <- melt(df,
            id="id",
            measure.vars = patterns("^ID_","^ko_","^jaz_","^name_"),
            value.name = c("ID","ko","jaz","name"))

dfl[!is.na(jaz),.(number=.N,
                  Mjaz=mean(ID,na.rm=T),
                  SDjaz=sd(ID,na.rm = T)),
    by=.(name,jaz)]
#     name jaz number Mjaz      SDjaz
#1:   k2_g_6   A      2 12.5  0.7071068
#2:   k2_g_6   B      2 12.0  2.8284271
#3: k5_n_100   A      2 15.0  0.0000000
#4: k5_n_100   B      3 60.0 22.9128785

You can also split the column name into yous type, class and last with

dt[,c("type","class","last"):=tstrsplit(name,"_")]
Peace Wang
  • 2,399
  • 1
  • 8
  • 15