1

My data frame is:

`Account id Fcast 1 Fcast 2 Fcast 3 Diff 1  Diff 2  Diff 3  
  101          4000  2000   1000    1000    3000    4000    
  201          2900  3300   5000    100     300     2000    
  301          -100  5500   -800    1700    7300    1000    
  401          5000  8000   7100    2500    500     400     
  501          9000  12000  2000    15000   12000   22000   

Result required is to find out minimum value from the column labeled as Diff...

`Account id  Min
  101        1000
  201         100
  301        1000
  401         400
  501       12000

Also ideally i also need to fetch another column which tells is filled by column name from which the minimum value is fetched.

A.B.
  • 83
  • 1
  • 8

4 Answers4

2

We can use apply in row mode here:

data.frame(AccountId=df$AccountId,
           Min=apply(df[names(df)[grepl("^Diff\\d", names(df))]], 1, FUN=min))

  AccountId   Min
1       101  1000
2       201   100
3       301  1000
4       401   400
5       501 12000

Data:

df <- data.frame(AccountId=c(101, 201, 301, 401, 501),
                 Fcast1=c(4000, 2900, -100, 5000, 9000),
                 Fcast2=c(2000, 3300, 5500, 8000, 12000),
                 Fcast3=c(1000, 5000, -800, 7100, 2000),
                 Diff1=c(1000, 100, 1700, 2500, 15000),
                 Diff2=c(3000, 300, 7300, 500, 12000),
                 Diff3=c(4000, 2000, 1000, 400, 22000))
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

Using dplyr :

library(dplyr)
cols <- grep('Diff', names(df), value = TRUE)

df %>%
  group_by(Accountid) %>%
  mutate(Min = min(c_across(cols)),
         Min_name = cols[which.min(c_across(cols))]) %>%
  select(Accountid, Min, Min_name)

#  Accountid   Min Min_name
#      <int> <int> <chr>   
#1       101  1000 Diff1   
#2       201   100 Diff1   
#3       301  1000 Diff3   
#4       401   400 Diff3   
#5       501 12000 Diff2   

data

df <- structure(list(Accountid = c(101L, 201L, 301L, 401L, 501L), 
Fcast1 = c(4000L, 2900L, -100L, 5000L, 9000L), Fcast2 = c(2000L, 3300L, 5500L, 
8000L, 12000L), Fcast3 = c(1000L, 5000L, -800L, 7100L, 2000L), 
    Diff1 = c(1000L, 100L, 1700L, 2500L, 15000L), Diff2 = c(3000L, 
    300L, 7300L, 500L, 12000L), Diff3 = c(4000L, 2000L, 1000L, 
    400L, 22000L)), class = "data.frame", row.names = c(NA, -5L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • what if want the Fcast1 instead of Diff1 in the answer data frame. – A.B. Sep 22 '20 at 05:08
  • In that case you could create a new vector `cols1 <- grep('Fcast', names(df), value = TRUE)` and then use that in `which.min` like this : `Min_name = cols1[which.min(c_across(cols))])` – Ronak Shah Sep 22 '20 at 05:13
  • also with me it says - [could not find function "c_across"] – A.B. Sep 22 '20 at 07:39
  • `c_across` is available in `dplyr` 1.0.0 or higher. You may need to update your `dplyr` package. – Ronak Shah Sep 22 '20 at 07:41
  • c_across issue resolved but now the error is "could not find function "%>%"" – A.B. Sep 22 '20 at 07:59
  • @A.B. Did you load `library(dplyr)` which is at the top of the answer? – Ronak Shah Sep 22 '20 at 08:18
  • in - cols1 <- grep('Fcast', names(df), value = TRUE), it would work if all columns have Fcast word in the column name. Also, can we also bring the Fcast value instead of min value of diff but it would be based on min value of diff. – A.B. Sep 22 '20 at 08:18
  • @A.B. Please ask that as a separate question instead of extending question in comments. My answer already gives min value of "diff" columns + their column names which was your original question. – Ronak Shah Sep 22 '20 at 08:20
  • i tried which.min like this : Min_name = cols1[which.min(c_across(cols))]) The Min_Name is showing "NA" in the result – A.B. Sep 22 '20 at 08:26
  • Did you try it on the data I have shared under my post? Does it work for you with that data? If it does work with my data and doesn't work for the data that you have probably you need to change adjust the answer based on your data. If you are not able to do it yourself add your data using `dput` similarly so that I can check. – Ronak Shah Sep 22 '20 at 08:34
  • Can you help with this error Error: Problem with `mutate()` input `Min_name`. x Input `Min_name` can't be recycled to size 1. i Input `Min_name` is `cols1[which.min(c_across(cols))]`. i Input `Min_name` must be size 1, not 0. i The error occurred in group 1: account_id = 550385. Run `rlang::last_error()` to see where the error occurred. – A.B. Sep 22 '20 at 08:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/221869/discussion-between-a-b-and-ronak-shah). – A.B. Sep 22 '20 at 10:38
  • This is what i have done cols1 <- grep('Diff', names(df), value = TRUE) df %>% group_by(Accountid) %>% mutate(Min = min(c_across(cols)), Min_name = cols1[which.min(c_across(cols))]) %>% select(Accountid, Min, Min_name) This is the error – A.B. Sep 22 '20 at 11:15
  • Please add sample of your data with `dput(head(df))`. – Ronak Shah Sep 22 '20 at 12:41
  • Thats done thanks ! @ronakshah - one question, what if all the column names didnt had Diff or Fcast in common? – A.B. Sep 22 '20 at 13:17
  • If there is no pattern to identify the column names. You need to add them manually `cols <- c('abc', 'def', 'xyz')` etc. – Ronak Shah Sep 22 '20 at 13:47
2

another option would be to use apply function:

df <- data.frame(df$AccountId, min = apply(df[, 2:ncol(df)], 1, min))
AlexB
  • 3,061
  • 2
  • 17
  • 19
0

A solution using data.table

dt[,`:=`(min_val=apply(.SD,1,min),
         min_col=names(.SD)[apply(.SD,1,which.min)]),.SDcols=names(dt) %like% 'diff']
  • Here,.SDcols chooses the subset of columns to work with, in this case, columns having the work diff in it. Hence, the use of %like
  • .SD now behaves as a subsetted data.table having only the diff columns.
Hasan Bhagat
  • 395
  • 1
  • 8