1

I have a dataframe that contains NaN and Inf. I want to rank the data based on a variable (Q). So, I am using

rank(df$Q, ties.method= "first")

#> [1]  3  5  6  4  2  9  7 10  8  1

As you can see that even NaN and Inf are also ranked.

So, I want to ignore the ranking if the data contains NaN and Inf. I am using the following code:

#Checking if Q is valid
if((df$Q %in% "NaN") || (df$Q %in% "Inf")){
  RankingQ <- rep("-", nrow(df))
}else{
  RankingQ <- rank(df$Q, ties.method= "first") 
}

It returns the following error:

Error in (df$Q %in% "NaN") || (df$Q %in% "Inf") : 'length = 10' in coercion to 'logical(1)'

It used to work before with a warning (R ver. 4.2.0). But in R version 4.3.0, it returns an error

Calling && or || with LHS or (if evaluated) RHS of length greater than one is now always an error, with a report of the form

'length = 4' in coercion to 'logical(1)'

My input is

df <- structure(list(Alternatives = 1:10, Q = c(0.375, 0.5, 0.5, 0.469, 
0.219, NaN, Inf, NaN, Inf, 0.153)), class = "data.frame", row.names = c(NA, 
-10L))

My desired output is

Alternatives    Q   Rank
1             0.375 3
2             0.500 5
3             0.500 6
4             0.469 4
5             0.219 2
6             NaN   NA
7             Inf   NA
8             NaN   NA
9             Inf   NA
10            0.153 1

How can I solve this problem?

UseR10085
  • 7,120
  • 3
  • 24
  • 54

3 Answers3

2

Using tidyverse:

df %>% 
  mutate(Rank = ifelse(is.na(Q) | is.infinite(Q), NA, rank(Q, ties.method = "first")))

   Alternatives     Q Rank
1             1 0.375    3
2             2 0.500    5
3             3 0.500    6
4             4 0.469    4
5             5 0.219    2
6             6   NaN   NA
7             7   Inf   NA
8             8   NaN   NA
9             9   Inf   NA
10           10 0.153    1

Using base R:

df$Rank <- ifelse(is.na(df$Q) | is.infinite(df$Q), NA, rank(df$Q, ties.method = "first"))
Mark
  • 7,785
  • 2
  • 14
  • 34
  • Thank you Mark for the solution. But I want to use it within a function. So, it would be better if the answer is based on base R. – UseR10085 Jul 12 '23 at 05:43
  • 1
    updated. You can swap out the NA for "_", but generally speaking mixing strings and numbers in one column is bad practice, hence NAs instead – Mark Jul 12 '23 at 05:50
  • Based on your suggestion I have updated my question to replace "_" with NA. – UseR10085 Jul 12 '23 at 05:54
2

I think you can get what you are loking for using the following funcion:

ranking <- function(x){
  v <- x
  # Positions of elements which are Inf or NaN
    aux <- which(x %in% c(NaN, Inf))
  # Non interesting values
    v[aux] <- "-"
  # Rank the remaining values
    v[-aux] <- rank(v[-aux], ties.method = "first") 
  return(v)    
}

Finally, running df$Rank <- ranking(df$Q) you get the output you expect.

R18
  • 1,476
  • 1
  • 8
  • 17
1

Note that rank has the argument na.last that can take the value "keep" to keep NA unchanged (emphasis mine):

na.last
a logical or character string controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if "keep" they are kept with rank NA.

This works both for NA and NaNs, but not for Inf values, so you can use replace to change that beforehand:

Q = c(0.375, 0.5, 0.5, 0.469, 0.219, NaN, Inf, NaN, Inf, 0.153)
rank(replace(Q, is.infinite(Q), NA), ties.method= "first", na.last = "keep")
#[1]  3  5  6  4  2 NA NA NA NA  1
Maël
  • 45,206
  • 3
  • 29
  • 67