order a row by column name of other data frame and match in length

Question

For example you have this data frame :

dd <- data.frame(b = c("cpg1", "cpg2", "cpg3", "cpg4"), 
                  x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
                 z = c(1, 1, 1, 2))
dd
     b x y z
1 cpg1 A 8 1
2 cpg2 D 3 1
3 cpg3 A 9 1
4 cpg4 C 9 2

I want to order the column names (b,x,y,z) by a row in another data frame which is:

d <- data.frame(pos = c("x", "z", "b"), 
                 g = c("A", "D", "A"), h = c(8, 3, 9))
d
  pos g h
1   x A 8
2   z D 3
3   b A 9

So I want to order the column name of dd with the row d$pos and dd also needs to have the same number in the row d$pos.

I tried with order and match but it did not give me the need result. My dataset is quite large, so something automtic would be ideal.

Thanks a lot for your help!

I dont see how this will work as the dataframes are of of unequal size. — MLEN, Apr 02 '17 at 06:56

akrun · Accepted Answer · 2017-04-02T07:32:35.677

0

We can do a match and then order the columns

i1 <- match(d$pos, names(dd), nomatch = 0)
dd[i1]
#  x z    b
#1 A 1 cpg1
#2 D 1 cpg2
#3 A 1 cpg3
#4 C 2 cpg4

Or if we want only the columns based on the 'd$pos'

dd[as.character(d$pos)]
#  x z    b
#1 A 1 cpg1
#2 D 1 cpg2
#3 A 1 cpg3
#4 C 2 cpg4

edited Apr 02 '17 at 07:32

answered Apr 02 '17 at 07:00

akrun

874,273
37
540
662

I think I mistakenly click on something, but it seems to work. Now I only need to find a way to make dd column name equal to d$pos – XXXX992 Apr 02 '17 at 07:07
@Julie What do you mean by make `dd column name equal to d$pos` the lengths are not the same – akrun Apr 02 '17 at 07:09
@Julie I updated the post. Let me if this is what you wanted – akrun Apr 02 '17 at 07:10
yes exactly, I only need the number of variables that are in d$pos to be in the column name of dd – XXXX992 Apr 02 '17 at 07:11
it works for the small data frames but not on my datasets. – XXXX992 Apr 02 '17 at 07:19
@Julie Okay, then it must be because the values in your `d$pos` is not in the column names of `dd`. But i think it should still work with the modified first solution if I am not wrong `dd[i1]` – akrun Apr 02 '17 at 07:22
i think the problem is when I run i1 with my datasets then I get a vector with NA so indeed the column names of dd are not of the same kind as d$pos. – XXXX992 Apr 02 '17 at 07:30
@Julie You can rectify it with `i1 <- match(d$pos, names(dd), nomatch = 0)` – akrun Apr 02 '17 at 07:31
Now I get all zeros in the i1 vector, I need to sort out how to put the column names of dd in the same format as d$pos since there are in words the same but maybe R sees them differently – XXXX992 Apr 02 '17 at 07:36
@Julie It is because you don't have a match there. So all those values that are not matching are now 0 instead of NA and 0 index is not taken in R. So, you may need to recheck your original datasets and if it is infact similar to column names. Even leading/lagging spaces can create problems – akrun Apr 02 '17 at 07:37
When I use this ` i1 <- match(d$pos, names(dd[1,]), nomatch = 0) ` then I get number in the vector. But then when I use this `dd=dd[c(i1, setdiff(seq_along(i1), i1))] dd=dd[as.character(d$pos)]` then dd becomes a vector and when I run the last line, it become NA – XXXX992 Apr 02 '17 at 07:50
@Julie You need to provide a reproducible example that gives the error. With the example you posted, it is working fine for me. When you are giving an example, also use `dput` on a smaller dataset – akrun Apr 02 '17 at 07:56
1

I cannot seem to produce a reproducible example that would give that error. Thank you for your help, though! – XXXX992 Apr 02 '17 at 11:49
@Julia Okay, I answered what was showed in the example and it works for me. – akrun Apr 02 '17 at 11:51

order a row by column name of other data frame and match in length

1 Answers1