0

I have been searching for a few hours now and I’m very close but I just can’t get it to work. Basically, I have a word frequency that I want to use to build a word cloud. However, I would like to add some meaning to the colours plotted. For that reason I’ve added to my data.frame a third column that would condition the colours to be used in the wordcloud.

In the example below you will see that column “diff” is the difference in population between each city a threshold (6).

I would like the green and red to reflect the size of the difference between the population in each city and the threshold (that is working thanks to the post here) the tricky bit is that I would like city’s with the population equal to the threshold to have a specific colour (grey, "#c5c5c5") and that I just can’t do .

library(wordcloud)
library(tm)

DF <- data.frame(
city = c("New York","Barcelona","Paris","Rome","London", "Brussels", "Leeds", "Berlin"),
pop = c(12,7,5,7,6,2,0.8,6),
diff= c(6,1,-1,1,0,-4,-5.2,0))


custColorPal <- colorRampPalette(c("#ff0000","#00cc00"))

color_range_number <- length(unique(DF$diff))

colors <- custColors[factor(DF$diff)]
custColors <- custColorPal(color_range_number)

wordcloud(DF$city, DF$pop, colors=custColors, min.freq = 0.1, ordered.colors=FALSE)

In the example above I would expect two city’s to be grey, three to be green and three to be red.

Second attempt: I have managed(with the help of jazzurro) to colour the cities names that have the pop equal to the threshold grey. However, if you run the code below you will see something odd. Basically, we should only get one red city name and now we have several (I've change the initial values to test it). I understand that the gradient is evenly distributed but if one stretches the values in one direction it just does not work.

Is there a way to use two gradients at the same time? One for greater than and another for less than zero(or any other value)?

DF <- data.frame(
  city = c("New York","Barcelona","Paris","Rome","London", "Brussels", "Leeds", "Berlin"),
  pop = c(12,7,5,7,6,2,0.8,6),
  diff= c(20,1,10,1,0,7,-0.2,0))
DF$city<-as.character(DF$city)

custColorPal <- colorRampPalette(c("#ff0000","#00cc00"))
color_range_number <- length(unique(DF$diff))
custColors <- custColorPal(color_range_number)
colors <- custColors[factor(DF$diff)]

DF<-cbind(DF,colors)

DF$colors<-as.character(DF$colors)

DF<-transform(DF, colors = case_when(
  diff == 0 ~ "#c5c5c5", 
  TRUE   ~ colors
))

wordcloud(DF$city, DF$pop, colors=DF$colors, min.freq = 0.1, ordered.colors=TRUE)

Thanks in advance for any pointers

Cheers

PatraoPedro
  • 197
  • 1
  • 16
  • Do you need green, read, and grey? Or do you need gradient colors? For instance red for the highest number and grey for zero, and blue for the lowest number. I see there are six colors in `custColors` while you have eight cities in your data. You may want to learn how to use `colors`, and `ordered.colors` arguments. – jazzurro Nov 13 '19 at 00:33
  • @jazzurro thanks for the help. It does makes it easier to assign specific colours but it does not solve the issue with the colour gradiente which will be an important aspect of this wordcloud. As you suggested I'll try to think of ways to adapt your code. If I find something I will post it here. – PatraoPedro Nov 13 '19 at 08:44
  • If you haven't, check [this question](https://stackoverflow.com/questions/22255465/assign-colors-to-a-range-of-values/22256302#22256302). – jazzurro Nov 13 '19 at 09:01
  • I revised my answer for you. I hope this will help you. – jazzurro Nov 13 '19 at 14:33
  • If you have a wide range of numbers in `diff`, you may want to consider using `cut()` and create a categorical variables to create colors. – jazzurro Nov 13 '19 at 14:49

1 Answers1

2

Given your comment, I came up with the following idea. I do not know your actual data. You still need to consider how to adjust this code. I modified your original DF; I changed values in diff. In the present data, max value is 90 and min value is -95. First I created colors for 0-100 using colorRampPalette(). Similarly I created colors for -1 to -100. I combined the two vectors. Note that gray appears twice. That is why you see [-1] in the line for mycolors. You need to think how you would need to create colors based on your actual data. Once the colors are ready, I created a new column in the data set. Basically, I am using diff to identify an index number of a color in case_when. Finally, I drew the wordcloud. I hope you can adjust this code for your own data.

library(tidyverse)
library(wordcloud)

DF <- data.frame(city = c("New York","Barcelona","Paris","Rome","London", "Brussels", "Leeds", "Berlin"),
                 pop = c(12, 7, 5, 7, 6, 2, 0.8, 6),
                 diff = c(60, 20, -30, 90, 0, -10, -95, 0))

# Create gradient colors for positive and negative numbers.

positive_color_palette <- colorRampPalette(colors = c("green", "gray"), space = "Lab")(100)
negative_color_palette <- colorRampPalette(colors = c("gray", "red"), space = "Lab")(101)

mycolors <- c(positive_color_palette, negative_color_palette[-1])

# Color index begins with the highest value (100) to the lowest (-100).
# Gray colors is at the 100th position in mycolors
# Assign colors based on this knowledge.

mutate(DF,
       colors = case_when(100 + diff > 100 ~ mycolors[100 - diff],
                          100 + diff < 100 ~ mycolors[100 - diff],
                          100 + diff == 100 ~ mycolors[100])) -> res


wordcloud(words = res$city, freq = res$pop, colors = res$colors,
          min.freq = 0.1, random.order = FALSE, ordered.colors = TRUE)

enter image description here

jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • thanks for persisting and helping me out. It's amazing how things make sense once you see them in front of you. The case_when function is very flexible I will definitly be using it in the future. Cheers – PatraoPedro Nov 13 '19 at 20:58
  • @PatraoPedro You are weolcome. – jazzurro Nov 14 '19 at 00:32