-2

I have a dataframe.

I need to find a minimum value in the 1st column for each value of the 2nd column. But I should return the value in the 3rd column from the same row as the minimum found in the 1st column.

The first part seems is solved by tapply(1,2, min)

But how to pass the same row to the 3rd column?

The more complicated task is when the minimum is not unique in the 1st column. Then I need to choose the first name (out of several) alphabetically and again to find the corresponding value from the same row from the 3rd column.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Sounds like you should get started on some code then – Rich Scriven Sep 30 '14 at 01:21
  • 1
    Where is your dataframe? – Paulo E. Cardoso Sep 30 '14 at 01:28
  • @Ari Belenkly It is better to show the dataset using `dput` ie. copy and paste the output of `dput(head(data,10))` in your post. It is a little hard to know the structure of the dataset from the comments. – akrun Sep 30 '14 at 05:06
  • @ AriBelenkiy: you should give your feedback / accept / upvote helpful answers. – rnso Sep 30 '14 at 11:11
  • I am ready to acknowledge your help but when I vote up it requires 15 reputation which I don't have. – Ari Belenkiy Oct 02 '14 at 07:52
  • @Ari Belenkiy I updated with a function though repeated requests to provide a reproducible data was not fulfilled. – akrun Oct 02 '14 at 10:02
  • @Ari Belenkiy You should install devtools first. Check this link http://cran.r-project.org/web/packages/devtools/README.html . Regarding the second comment, yes, it is possible to create a function using sapply, split etc., but those methods I guess would be slower compared to dplyr (or data.table) which can handle big datasets easily. If you encounter any troubles in installing, I can create another function using base R tools. I guess lynghonig's ave based solution is also great. – akrun Oct 03 '14 at 05:16
  • @Ari Belenkiy I created a new function using `base R` – akrun Oct 03 '14 at 07:35

3 Answers3

1

A reproducible example would be handy to fully understand your question.

However, I think you can use ave for this.

a<-c(1:10)
b<-c(rep(1,3),rep(2,4),rep(3,3))
c<-c(101:110)

df<-cbind(a,b,c)

which gives

df
      a b   c
[1,]  1 1 101
[2,]  2 1 102
[3,]  3 1 103
[4,]  4 2 104
[5,]  5 2 105
[6,]  6 2 106
[7,]  7 2 107
[8,]  8 3 108
[9,]  9 3 109
[10,] 10 3 110

So I am going to find the min of a my b and keep the corresponding c.

rows<-df[which(ave(df[,1],df[,2],FUN=function(x) x==min(x))==1),]

which gives

rows
     a b   c
[1,] 1 1 101
[2,] 4 2 104
[3,] 8 3 108
lynghonig
  • 11
  • 2
  • I think both answers might be right but I have difficulty since my data are "names" like here: name landmass zone area population 1 Afghanistan 5 1 648 16 2 Albania 3 1 29 3 3 Algeria 4 1 2388 20 4 American-Samoa 6 3 0 0 5 Andorra 3 1 0 0 6 Angola 4 2 1247 7 – Ari Belenkiy Sep 30 '14 at 03:48
1

It is unclear after reading the comments.

library(dplyr)
 df %>% 
    group_by(zone) %>%
    filter(population==min(population)) %>%
    #ungroup() %>% #if you don't need zone
    select(name)
 #    zone           name
 #  1    3 American-Samoa
 #  2    1        Andorra
 #  3    2         Angola

Update

 devtools::install_github("hadley/dplyr")
 devtools::install_github("hadley/lazyeval")

 library(dplyr)
 library(lazyeval)

 fun2 <- function(grp, Column, grpDontShow=TRUE){ 
         stopifnot(is.numeric(df[,grp]) & Column %in% colnames(df))
         df1 <- df %>% 
                   group_by_(grp) %>%
                   filter_(interp(~x==min(x), x=as.name(Column)))%>%
                   arrange(name) %>%
                   filter(row_number()==1) %>%
                   select(name)     
        if(grpDontShow){
                ungroup(df1) %>%
                          select(name)
                 }
        else {
            df1
          }            
        }       

 fun2("zone", "population", TRUE)
 # Source: local data frame [3 x 1]

 #            name
 #1        Andorra
 #2         Angola
 #3 American-Samoa

  fun2("zone", "landmass", FALSE)
  #Source: local data frame [3 x 2]
  #Groups: zone

  #  zone           name
  #1    1        Albania
  #2    2         Angola
  #3    3 American-Samoa

   fun2("ozone", "landmass", FALSE)
   #Error in `[.data.frame`(df, , grp) : undefined columns selected

  fun2("name", "landmass", FALSE)
  #Error: is.numeric(df[, grp]) & Column %in% colnames(df) is not TRUE

Update2

If you need a function using base R

  funBase <- function(grp, Column, grpDontShow = TRUE) {
            stopifnot(is.numeric(df[, grp]) & Column %in% colnames(df))
            v1 <- c(by(df[, c(Column, "name")], list(df[, grp]),
                   FUN = function(x) sort(x[,2][x[, 1] == min(x[, 1],
                                                   na.rm = TRUE)])[1]))

             if (grpDontShow) {
               data.frame(name = v1, stringsAsFactors = FALSE)
             }
              else {
             setNames(data.frame(as.numeric(names(v1)),
                       v1, stringsAsFactors = FALSE), c(grp, "name"))

            }
         }

   funBase("zone", "landmass")
   #            name
   #1        Albania
   #2         Angola
   #3 American-Samoa

  funBase("zone", "population", FALSE)
  #  zone           name
  #1    1        Andorra
  #2    2         Angola
  #3    3 American-Samoa

data

 df <- structure(list(name = c("Afghanistan", "Albania", "Algeria", 
 "American-Samoa", "Andorra", "Angola"), landmass = c(5L, 3L, 
 4L, 6L, 3L, 4L), zone = c(1L, 1L, 1L, 3L, 1L, 2L), area = c(648L, 
 29L, 2388L, 0L, 0L, 1247L), population = c(16L, 3L, 20L, 0L, 
 0L, 7L)), .Names = c("name", "landmass", "zone", "area", "population"
 ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
 "6"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • No, I need to choose the country with minimal population separately for every zone. – Ari Belenkiy Oct 02 '14 at 05:31
  • @AriBelenkiy It seems to me that what you said is what akrun demonstrated. Would you please rephrase what you mean? Alternatively, would you be able to provide a sample data and an expected outcome? – jazzurro Oct 02 '14 at 05:57
  • Thank you for editing the data. Actually I need a function that has two arguments: one (numeric) for "zone", another - for any other column where a minimal value exists (eg "population" or "landmass"). I also need to use stop function if one of the arguments is invalid with a warning. – Ari Belenkiy Oct 02 '14 at 07:59
  • Akrun, you are a great man. I feel it is right but I cannot ascertain it since I got the message: devtools::install_github("hadley/dplyr") Error in loadNamespace(name) : there is no package called ‘devtools’. – Ari Belenkiy Oct 03 '14 at 04:16
  • To continue: I feel I like the second way that you and lynghonig showed more. Could you create a function that uses "sapply", "split" and "order" rather than "filter"? And again - what is the way to acknowledge your help? Which button must be clicked? – Ari Belenkiy Oct 03 '14 at 04:18
0

Try:

> ddf
    col1 col2 col3
 1:    5    a    A
 2:    2    a    B
 3:    3    a    C
 4:    6    a    D
 5:    4    b    E
 6:    2    b    F
 7:    6    b    G
 8:    2    b    H
 9:    7    c    I
10:    2    c    J
11:    6    c    K
12:    4    c    L
13:    2    c    M
> 
> sapply(split(ddf, ddf$col2), 
         function(x) {x = x[order(x$col3),]; x$col3[which.min(x$col1)]})
a b c 
B F J 
Levels: A B C D E F G H I J K L M

Using @lynghonig's data:

> sapply(split(ddf, ddf$b), 
         function(x) {x = x[order(x$c),]; x$c[which.min(x$a)]})
  1   2   3 
101 104 108 

With OP's data (from comments):

> sapply(split(ddf, ddf$landmass), function(x) {x = x[order(x$zone),]; x$zone[which.min(x$name)]})
3 4 5 6 
1 1 1 3 
rnso
  • 23,686
  • 25
  • 112
  • 234