0

I've searched several threads but have yet to find a solution.

I have a geom_bar plot with 40+ variables. I have created a separate df to tag each variable according to a specific category, and assigned a colour to the category. Across 40+ variables, there are 4 colours/categories included in the plot.

I would like the legend of the plot to show the colours of the categories, not the individual variables. I know I can accomplish this by having the colours/categories in the original df, however, I would like to be able to use the colour/category reference df in many different projects and avoid always having to add columns to the plotting dfs to tag categories and colours.

Here is an example where df is the data plotted, and df_cols is akin to my category/colour df. Ideally the legend would have "A=red, B=Blue, C=orange" and not variable names.

variable = c("abc", "def", "ghi", "jkl","mno", "pqr", "stu")
tag = c("A", "B", "C", "A","B", "A", "B")
colours = as.character(c("red", "blue", "orange", "red", "blue", "red", "blue"))

# Create colour reference df 
df_cols = data.frame(variable, tag, colors = as.character(colours))
cols = df_cols$colors
cols = as.character(cols)
names(cols) = as.character(names(cols))

# Plotting df
df = data.frame(variable, value=c(1:7))

ggplot(df)+
  geom_bar(aes(x=variable, y=value, fill=variable),stat = "identity")+
  scale_fill_manual(values = cols)

Here is a copy of the actual plot that I'm making: enter image description here

vb66
  • 353
  • 3
  • 14
  • I’m a little confused by your description. Could you mock up an example of what you’re trying to achieve? – jdobres May 19 '20 at 23:14

1 Answers1

1

I think this might be what you are after. I've simplified your code to take advantage of 'tag' as a discrete variable to control the fill colour.

library(ggplot2)


# Plotting df
df <- data.frame(variable = c("abc", "def", "ghi", "jkl","mno", "pqr", "stu"),
                 tag = c("A", "B", "C", "A","B", "A", "B"),
                 value = c(1:7))

As you are plotting values on the y axis you can simplify your geom to geom_col which is designed for this case and avoids the call to stat

ggplot(df)+
  geom_col(aes(x = variable, y = value, fill = tag)) +
  scale_fill_discrete(breaks = c("A", "B", "C"),
                      values = c("red", "blue", "orange"),
                      labels  = c("red", "blue", "orange"),
                      name = "Colour")

Created on 2020-05-20 by the reprex package (v0.3.0)

Peter
  • 11,500
  • 5
  • 21
  • 31
  • Hi Peter, I edited my question above with a copy of a sample plot. I'm trying avoid having to type out all the variable names within the ggplot call. I'm trying to find a solution where I can use a separate df with all of the metrics, tags, and colours in it if that's even possible. – vb66 May 21 '20 at 01:16
  • I don't think you need to type out all the variable names within the `ggplot` call. What is important is changing the `fill = variable` to `fill = tag` argument. – Peter May 21 '20 at 07:27
  • That's true, I just saw you had done it in your example with the ```scale_fill_discrete()``` but you're right that it's not necessary. I'm trying to avoid having to add columns to the additional df and have everything come from the df_cols reference. – vb66 May 22 '20 at 14:27