3

As someone new to R, I am working at producing a word cloud that shows two variables: frequency and rating. Using a generic table, I am looking to display the hypothetical number of colleges (font = big to small in number) by state and the hypothetical average college rating

  • 1 = green (good),
  • 3 = yellow (average),
  • 5 = red (bad)

I am able to to create this cloud that depicts fonts = number of colleges, but cannot tie in the rating to the third column. Here is my generic table:

State   Colleges    Rating
Alabama        220      1
Alaska         100      3
Arizona         50      5
Arkansas       275      1
California     155      3
Colorado        68      5
Connecticut    235      1
Delaware       189      3
Florida         32      5
Georgia        219      1
Hawaii         117      3
Idaho           63      5
Illinois       264      1
Indiana        167      3
Iowa            76      5
Kansas         287      1
Kentucky       178      3
Louisiana       67      5
Maine          246      1
Maryland       169      3
Massachusetts   46      5
Michigan       225      1
Minnesota      132      3
Mississippi     23      5
Missouri       219      1
Montana        194      3
Nebraska        97      5

Below is my very simple script:

library(wordcloud)
library(rcolorbrewer)

data <- read.csv("wordcloud.csv", header = T)
pal <- brewer.pal(9, "RdYlGn")
wordcloud(data$State, data$Colleges, scale = c(4,1), colors = pal, rot.per=.5)

The above script allows for text size to reflect number of colleges, but I am not able to link the color ramp of 1 = green (good) to 3 = yellow (average) to 5 = red (bad). Any suggestions are greatly appreciated.

alistaire
  • 42,459
  • 4
  • 77
  • 117
csv2004
  • 31
  • 2

2 Answers2

3

There's also the possibility to plot a comparison cloud in such cases.

For this, we first convert the data from long to wide format:

library(reshape2)
df1 <- dcast(df1,State + Colleges ~ Rating, value.var = "Colleges")

Then we perform a few standard operations to prepare a suitable matrix:

rownames(df1) <- df1[,1] #use name of States as row names
df1 <- df1[,-c(1,2)] #remove "States" and "Colleges" column
df1[is.na(df1)] <- 0  #set NA values to zero
df1 <- as.matrix(df1) #convert into matrix
colnames(df1) <- c("good", "average", "bad")

Finally, we can plot the comparison cloud and assign colors to the groups as we wish:

library(wordcloud)
comparison.cloud(df1,max.words=Inf,random.order=FALSE, scale = c(4,.5), 
                     title.size = 1,  colors=c("green","orange","red"))

enter image description here

data

df1 <- structure(list(State = structure(1:27, .Label = c("Alabama", 
"Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", 
"Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", 
"Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", 
"Missouri", "Montana", "Nebraska"), class = "factor"), Colleges = c(220L, 
100L, 50L, 275L, 155L, 68L, 235L, 189L, 32L, 219L, 117L, 63L, 
264L, 167L, 76L, 287L, 178L, 67L, 246L, 169L, 46L, 225L, 132L, 
23L, 219L, 194L, 97L), Rating = c(1L, 3L, 5L, 1L, 3L, 5L, 1L, 
3L, 5L, 1L, 3L, 5L, 1L, 3L, 5L, 1L, 3L, 5L, 1L, 3L, 5L, 1L, 3L, 
5L, 1L, 3L, 5L)), .Names = c("State", "Colleges", "Rating"), 
class = "data.frame", row.names = c(NA, -27L))
RHertel
  • 23,412
  • 5
  • 38
  • 64
2

You can assign the colours manually and add ordered.colors=T

wordcloud(data$State, data$Colleges, 
scale = c(4,1), 
colors = rep(c("green", "yellow", "red"), 9), 
rot.per=.5, 
ordered.colors=T)

enter image description here

erc
  • 10,113
  • 11
  • 57
  • 88
  • I appreciate the input - it works, but if I change the order (i.e. all "1's" first, then all "2's", etc.) the green-yellow-red keeps looping and does not follow the value in that column. I'm still messing with this. Thanks again-you helped a great deal – csv2004 Mar 17 '16 at 16:37
  • Sure, the order is specific to the order of the rows, but if you change it you can still use the approach in my answer if you then also adjust the order of the colours accordingly, e.g. to `rep(c("green", "yellow", "red"), each=9)` – erc Mar 17 '16 at 19:25
  • 1
    I was talking with a coworker today and came up with the following:vec <- vector() for(i in 1:nrow(data)){ if(data[i,3]==1){ }} for(i in 1:nrow(data)){ if(data[i,3]==1){ vec[i]="green"} if(data[i,3]==3){ vec[i]="yellow"} if(data[i,3]==5){ vec[i]="red"} } – csv2004 Mar 19 '16 at 00:07