1

I have two easy matrices (or df's) to merge:

a <- cbind(one=0:15, two=0:15, three=0:15)
b <- cbind(one=0:15, two=0:15, three=0:15)
#a <- data.frame(one=0:15, two=0:15, three=0:15)
#b <- data.frame(one=0:15, two=0:15, three=0:15)


No problem: after sorting on column one, column one is output ascending nicely from 0 to 15:

merge(a,b,by=c("one"), sort=T)
   one two.x three.x two.y three.y
1    0     0       0     0       0
2    1     1       1     1       1
3    2     2       2     2       2
4    3     3       3     3       3
5    4     4       4     4       4
6    5     5       5     5       5
7    6     6       6     6       6
8    7     7       7     7       7
9    8     8       8     8       8
10   9     9       9     9       9
11  10    10      10    10      10
12  11    11      11    11      11
13  12    12      12    12      12
14  13    13      13    13      13
15  14    14      14    14      14
16  15    15      15    15      15


But wait: when merging on two columns --- both numeric --- the sort order suddenly seems alphabetic.

merge(a,b,by=c("one", "two"), sort=T)
   one two three.x three.y
1    0   0       0       0
2    1   1       1       1
3   10  10      10      10
4   11  11      11      11
5   12  12      12      12
6   13  13      13      13
7   14  14      14      14
8   15  15      15      15
9    2   2       2       2
10   3   3       3       3
11   4   4       4       4
12   5   5       5       5
13   6   6       6       6
14   7   7       7       7
15   8   8       8       8
16   9   9       9       9

Eww, gross. What's going on? And what do I do?

joran
  • 169,992
  • 32
  • 429
  • 468
enfascination
  • 1,006
  • 9
  • 20
  • Can you include the code you used to sort the matrices? – joran Nov 09 '12 at 20:32
  • merge() sorts implicitly (with default arg sort=T) --- the code you see is enough to reproduce the problem. – enfascination Nov 09 '12 at 20:34
  • And this effect doesn't seem to have anything to do with row/col names either: a <- matrix(0:45, ncol=3);b <- matrix(0:45, ncol=3); merge(a,b,by=c(1,2), sort=T) – enfascination Nov 09 '12 at 20:35
  • Ok, you're wording is rather confusing. You say that _after_ you sort, get that behavior with `merge`. Merging and sorting are very distinct operations. – joran Nov 09 '12 at 20:35
  • If all you're doing is merging, this "problem" is actually the documented behavior of the function. – joran Nov 09 '12 at 20:36
  • Re-read the documentation of the `sort` argument (but be aware that setting it to `FALSE` does not reliably produce any particular ordering). – joran Nov 09 '12 at 20:39
  • Fixed the wording a bit, and I've been to the docs. But I don't buy that this is documented. Are you talking about this: "The rows are by default lexicographically sorted on the common columns"? That doesn't explain why the behavior is different for sorting on one or two columns. That's a doozy. And it doesn't say how to change the default. Any advice on getting behavior as in the first example within merge? Or do I sort after every merge? – enfascination Nov 09 '12 at 20:40
  • As I suggested before, re-check your use of the sort argument (as also Josh pointed out below). The behavior on one column is the exception, it appears, as in that case the argument passed to sort.list doesn't end up being a character. In general, there is no way to guarantee a particular sort order (other than the default). – joran Nov 09 '12 at 20:48
  • Thanks @joran for your persistence in making that point. Good to know what can't be relied on. – Josh O'Brien Nov 09 '12 at 21:04

1 Answers1

2

Based on @joran's comments, it looks like if you want the rows to be sorted in any particular order, you should explicitly set it yourself.

If the order you'd like is one in which the rows have increasing values of one or more columns, you can use the function order(), like this:

X <- merge(a, b, by = c("one", "two"))
X[with(X, order(one, two)),]
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • That behavior seems incidental for this example --- should I count on it for more complex sorts on bigger dfs? I guess it makes me nervous that I only get good sorting by turning sorting off. – enfascination Nov 09 '12 at 20:46
  • @user1073999 No. Once again, as I pointed out this piece _is_ documented: when `sort = FALSE` the result is not in any particular order. – joran Nov 09 '12 at 20:49
  • I buy this --- first merge then sort. But I stand by my frustration: between joran and the documentation the behavior in my first example (merge(a,b,by=c("one"), sort=T)) seems undocumented, but convenient, but pretty misleading, and it does not reward the studious reproducible-example-maker. – enfascination Nov 09 '12 at 20:57