2

I work with the dataframe df

Name = c("Albert", "Caeser", "Albert", "Frank")
Earnings = c(1000,2000,1000,5000)
df = data.frame(Name, Earnings)


Name        Earnings

Albert      1000
Caesar      2000
Albert      1000
Frank       5000

If I use the tapply function

result <- tapply(df$Earnings, df$Name, sum)

I get this table result

Albert  2000
Caeser  2000
Frank   5000

Are there any circumstances, under which the table "result" would not be ordered alphabetically, if I use the tapply function as described above?

When I tried to find an answer, I changed the order of the rows:

Name        Earnings
Frank       5000
Caeser      2000
Albert      1000
Albert      1000

but still get the same result.

I use multiple functions where I calculate with the output of tapply calculations and I have to be absolutely sure, that the output is always delivered in the same order.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
rmuc8
  • 2,869
  • 7
  • 27
  • 36
  • 1
    I would just use `data.table` as in `library(data.table) ; setDT(df)[, sum(Earnings), Name]` and solve this specific problem and many other – David Arenburg Feb 09 '15 at 15:19
  • 2
    The results are in the order of the levels of the index. The default order of "unordered" factors is alphabetic. – A. Webb Feb 09 '15 at 15:30

2 Answers2

1

Normally the output is ordered, but you can come up with examples where it is not. For example if you have factors with unordered levels.

df <- data.frame(Name = factor(c('Ben', 'Al'), levels = c('Ben', 'Al')),  
                 Earnings = c(1, 4))
tapply(df$Earnings, df$Name, sum)
## Ben  Al 
##   1   4 

In that case you can either use as.character or (probably saver) order the result afterwards.

tapply(df$Earnings, as.character(df$Name), sum)
##  Al Ben 
##   4   1 

result <- tapply(df$Earnings, df$Name, sum)
result[order(names(result))]
##  Al Ben 
##   4   1 

Another possible problem can be leading spaces:

df <- data.frame(Name = c(' Ben', 'Al'),  
                 Earnings = c(1, 4))
tapply(df$Earnings, df$Name, sum)
##  Ben   Al 
##    1    4 

In that case, just remove all leading spaces to get results ordered.

shadow
  • 21,823
  • 4
  • 63
  • 77
  • In the leading spaces examply, `tapply` is sorting the output, a space is just prioritized after 'z'. I.e., `sort(c(" ben", " al"))` returns `[1] " al" " ben"`. The only problem would be the inconsistent use of leading spaces. edit: I as wrong about ordering, spaces come first. See Shadow's below comment. – bjoseph Feb 09 '15 at 15:22
  • 1
    @bjoseph Actually, it seems that spaces come before the 'A': `sort(c('A', ' '))`. But yes, the problem is still inconsistent leading spaces. – shadow Feb 09 '15 at 15:24
0

You can order sapply output as you order any array in R. Using the [sort] command.1

> result
Albert Caeser  Frank 
  2000   2000   5000 

> sort(result,decreasing=TRUE)
 Frank Albert Caeser 
  5000   2000   2000 

Depending on what you want to order by, you can either sort the values as shown above (by leaving decreasing NULL, i.e. sort(result) you will get values in increasing order), or by sorting the names:

This will deliver the results by name in reverse alphabetical order result[sort(names(result),decreasing=TRUE)]

 Frank Caeser Albert 
  5000   2000   2000 

What else would you like to sort and order by?

bjoseph
  • 2,116
  • 17
  • 24
  • thx for your anwer. My questions is rather about "the function behind the function" (tapply): Are there any circumstances, under which tapply results would not be ordered alphabetically, if I use the tapply function on a column with characters, as described above? – rmuc8 Feb 09 '15 at 15:15