5

I premise I'm new with R and actually I'm trying to get the fundamentals. Currently I'm workin on a large dataframe (called "ppl") which I have to edit in order to filter some rows. Each row is included in a group and it is characterized by an intensity (into) value and a sample value.

       mz  rt      into   sample  tracker     sn   grp
 100.0153 126  2.762664      3    11908 7.522655   0
 100.0171 127  2.972048      2    5308  7.718521   0
 100.0788 272 30.217969      2    5309 19.024807   1
 100.0796 272 17.277916      3   11910  7.297716   1
 101.0042 128 37.557324      3   11916 27.991320   2
 101.0043 128 39.676014      2    5316 28.234918   2

Well, the first question is: "How can I select from each group the sample with the highest intensity?" I tried a for loop:

for (i in ppl$grp) {
temp<-ppl[ppl$grp == i,]
sel<-rbind(sel,temp[max(temp$into),])
}

The fact is that it works for ppl$grp == 0, but the next cycles return NAs rows. Then the filtered dataframe(called "sel") also should store the sample values of the removed rows. It should be as follows:

      mz  rt      into   sample  tracker     sn   grp
100.0171 127  2.972048   c(2,3)    5308  7.718521   0
100.0788 272 30.217969   c(2,3)    5309 19.024807   1
101.0043 128 39.676014   c(2,3)    5316 28.234918   2

In order to get this I would use this approach:

lev<-factor(ppl$grp)
samp<-ppl$sample
samp2<-split(samp,lev)
sel$sample<-samp2

Any hint? Because I cannot test it since I still don't have solved the previous problem.

Thanks a lot.

AeonRed
  • 77
  • 6

4 Answers4

2

Not sure if I follow your question. But maybe this will get you started.

library(dplyr)
ppl %>% group_by(grp) %>% filter(into == max(into)) 
user51855
  • 369
  • 1
  • 6
1

A base R option using ave is

ppl[with(ppl, ave(into, grp, FUN = max)==into),]

If the 'sample' column in the expected output have the unique elements in each 'grp', then after grouping by 'grp', update the 'sample' as the pasted unique elements of 'sample', then arrange the 'into' descendingly and slice the 1st row.

library(dplyr)
ppl %>%
    group_by(grp) %>% 
    mutate(sample = toString(sort(unique(sample)))) %>% 
    arrange(desc(into)) %>%
    slice(1L)
#       mz    rt      into sample tracker        sn   grp
#     <dbl> <int>     <dbl>  <chr>   <int>     <dbl> <int>
#1 100.0171   127  2.972048   2, 3    5308  7.718521     0
#2 100.0788   272 30.217969   2, 3    5309 19.024807     1
#3 101.0043   128 39.676014   2, 3    5316 28.234918     2
akrun
  • 874,273
  • 37
  • 540
  • 662
0

A data.table alternative:

library(data.table)
setkey(setDT(ppl),grp)
ppl <- ppl[ppl[,into==max(into),by=grp]$V1,]
##         mz  rt      into sample tracker        sn grp
##1: 100.0171 127  2.972048      2    5308  7.718521   0
##2: 100.0788 272 30.217969      2    5309 19.024807   1
##3: 101.0043 128 39.676014      2    5316 28.234918   2
aichao
  • 7,375
  • 3
  • 16
  • 18
0

I have no idea why this code would work

for (i in ppl$grp) {
  temp<-ppl[ppl$grp == i,]
  sel<-rbind(sel,temp[max(temp$into),])
}

max(temp$into) should return the maximum value--which appears to not be an integer in most cases.

Also, building a data.frame with rbind in every for loop instance is not good practice (in any language). It requires quit a bit of type checking and array growing that can get very expensive.

Also, max will return NA when there are any NAs for that group.

There is also a question about what you want to do about ties? Do you just want one result or all of them? The code Akrun gives will give you all of them.

This code will write a new column that has the group max

 ppl$grpmax <- ave(ppl$into, ppl$grp, FUN=function(x) { max(x, na.rm=TRUE ) } )

You can then select all values in a group that are equal to the max with

pplmax <- subset(ppl, into == grpmax)

If you want just one per group then you can remove duplicates

pplmax[!duplicated(pplmax$grp),]
pdb
  • 1,574
  • 12
  • 26