1

I've got a df with the following column among others (with superpopulations ID). it's colname is Superpop:

EUR
EUR
EUR
AMR
AMR
AFR
AMR
AFR
EUR
SAS
EUR
...

and I need (for later using scatterplot3d) to build another column, let's say the column named ¨pcolors¨, which has to have color names to identify the superpopulations to color the points in the plot. I want this as an output:

EUR red
EUR red
EUR red
AMR blue
AMR blue
AFR green
AMR blue
AFR green
EUR red
SAS yellow
EUR red
... ...

the thing is that they aren't sorted, and the df is 2524 lines long, so i can't do it manually and i would prefer not to sort it because of the order of the other columns. Is there a way, for instance, with a logical function to say ¨generate another column, and , if in that line Superpop==EUR, then write ¨red¨ at that line in pcolor column...¨ and so on for the 5 superpopulations i've got ? Any thoughts? Thanks!

msimmer92
  • 397
  • 3
  • 16
  • Perhaps creating a look-up table with your ID and associated color first, then merge the table to your original data. – www May 19 '17 at 15:42

2 Answers2

3

This is pretty simple:

pcolors <- unsplit(superpop, value = colors())

You can pick the colors you want in the value-parameter.

Majo
  • 176
  • 1
  • 9
2

Just use subscripting. You can create a named-vector of colors like this:

pcolor <- c(EUR = "red", AMR = "blue", AFR = "green", SAS = "yellow")

then if

df <- data.frame(Superpop = c("EUR","EUR","EUR","AMR","AMR","AFR","AMR","AFR","EUR","SAS","EUR"))

if you just do

df$color = pcolor[as.character(df$Superpop)]

Then df is:

Superpop  color
1       EUR    red
2       EUR    red
3       EUR    red
4       AMR   blue
5       AMR   blue
6       AFR  green
7       AMR   blue
8       AFR  green
9       EUR    red
10      SAS yellow
11      EUR    red
John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • Thank you ! Good answer, but I used the one of Majo's because it was shorter and did the trick :) . But now I have two ways – msimmer92 May 19 '17 at 16:37
  • 1
    @melunuge92 I also learned something from Majo's answer. In R there are typically several ways to achieve the same goal. – John Coleman May 19 '17 at 17:10