5

I have a table with a long list of aliased values like this:

> head(transmission9, 50)
# A tibble: 50 x 2
   In_Node  End_Node
   <chr>    <chr>   
 1 c4ca4238 2838023a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a

I would like to have R go through both columns and assign a number sequentially to each value, in the order that an aliased value appears in the dataset. I would like R to read across rows first, then down columns. For example, for the dataset above:

   In_Node  End_Node
   <chr>    <chr>   
 1  1       2
 2  1       3
 3  1       4
 4  1       5
 5  6       1
 6  7       8

Is this possible? Ideally, I'd also love to be able to generate a "key" which would match each sequential code to each aliased value, like so:

Code Value
1    c4ca4238
2    2838023a
3    d82c8d16
4    a684ecee
5    fc490ca4

Thank you in advance for the help!

gbg
  • 69
  • 5
  • `unique(x)` where `x` is a character vector will give you the unique elements ordered as they appear in `x` – Michael Roswell Jul 15 '21 at 15:48
  • I suspect you'll find a more elegant solution than this but I'd approach he problem of ordering the aliases with `sapply()`: `testm<-matrix(c(1,2,3,4, 4, 2, 1, 3), ncol =2)` `unique(sapply(t(testm), function(x)x))` – Michael Roswell Jul 15 '21 at 15:56

3 Answers3

6

You could do:

df1 <- df
df1[]<-as.numeric(factor(unlist(df), unique(c(t(df)))))
df1
  In_Node End_Node
1       1        2
2       1        3
3       1        4
4       1        5
5       6        1
6       7        8
Onyambu
  • 67,392
  • 3
  • 24
  • 53
5

You can match against the unique values. For a single vector, the code is straightforward:

match(vec, unique(vec))

The requirement to go across columns before rows makes this slightly tricky: you need to transpose the values first. After that, match them.

Finally, use [<- to assign the result back to a data.frame of the same shape as your original data (here x):

y = x
y[] = match(unlist(x), unique(c(t(x))))
y
  V2 V3
1  1  2
2  1  3
3  1  4
4  1  5
5  6  1
6  7  8

c(t(x)) is a bit of a hack:

  • t first converts the tibble to a matrix and then transposes it. If your tibble contains multiple data types, these will be coerced to a common type.
  • c(…) discards attributes. In particular, it drops the dimensions of the transposed matrix, i.e. it converts the matrix into a vector, with the values now in the correct order.
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • But yours is brilliant! :) upvoted already – AnilGoyal Jul 15 '21 at 16:10
  • Thank you so much for breaking down everything and explaining the code--I really appreciate it! I ran into an issue with your code saying that "assigned data must be compatible with existing data", where existing data was half the size of the assigned data. For that reason, I checked off someone else's solution which worked without the error, but your explanation was really helpful in understanding the code they provided. – gbg Jul 15 '21 at 18:17
4

A dplyr version

  • Let's first re-create a sample data
library(tidyverse)

transmission9 <- read.table(header = T, text = "   In_Node  End_Node
 1 c4ca4238 283802d3a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a")

Do this simply

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data()))))))
#>   In_Node End_Node
#> 1       1        2
#> 2       1        3
#> 3       1        4
#> 4       1        5
#> 5       6        1
#> 6       7        8

use .names argument if you want to create new columns

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data())))),
                .names = '{.col}_code'))

   In_Node End_Node In_Node_code End_Node_code
1 c4ca4238 2838023a            1             2
2 c4ca4238 d82c8d16            1             3
3 c4ca4238 a684ecee            1             4
4 c4ca4238 fc490ca4            1             5
5 28dd2c79 c4ca4238            6             1
6 f899139d 3def184a            7             8
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45