1

I have A and B as follows: //edit// I was sleepy and confused. These are NOT data frames.

> length(A)
[1] 490
> length(B)
[1] 17730

> str(A)
 num [1:490] 0.0113 -0.0106 0.2308 0.0435 0.2814 ...
> str(B)
 num [1:17730] 0.0118 0.0196 0.0344 0.0207 0.0566 ...

But for some reason when I used sort():

> length(sort(A))
[1] 490
> length(sort(B))
[1] 17729        #should be 17730

I don't know how to produce a reproducible example in this particular case, and I'm stuck on how I should go about troubleshooting this. What should I check?

Jaap
  • 81,064
  • 34
  • 182
  • 193
biohazard
  • 2,017
  • 10
  • 28
  • 41
  • Are you using `nrow` or `NROW`? This is very weird! – asb Mar 22 '14 at 18:40
  • Thank you for your interest! I'm using `nrow()` – biohazard Mar 22 '14 at 18:50
  • Hmm.. I am really confused. I don't understand how either `sort` or `nrow` is working on `A` and `B` which you claim are data.frames; let alone figuring out why you are losing a row. I should shut up. – asb Mar 22 '14 at 18:54
  • post `str` of A and B – rawr Mar 22 '14 at 20:52
  • So sorry. It was not a data frame so not `nrow()` but `length()` I'll remember to get some sleep before posting here, not the opposite. – biohazard Mar 23 '14 at 01:18

1 Answers1

3

Others have pointed out that sort() takes a vector and not a data.frame, but are there any NAs in the vector? The default in sort() is to remove NAs:

v <- c(2, 1, NA)
v
#[1]  2  1 NA

length(sort(v))
#[1] 2
length(sort(v, na.last = T))
#[1] 3

If you want to sort a data.frame you should use order() instead of sort(). order() has the same na.last argument as sort() except the default is TRUE instead of NA:

df <- data.frame(vars = c(2, 1, NA))
df_n <- data.frame(df[order(df$vars),])

nrow(df_n)
#[1] 3
matt_k
  • 4,139
  • 4
  • 27
  • 33
  • `> length(sort(B, na.last=T)) [1] 17730` that is correct! Now I just need to figure out why there was an NA there in the first place. Thank you and sorry for the confusion between `nrow()` and `length()` – biohazard Mar 23 '14 at 01:20