2

I am learning R and I am trying to create a composite histogram that will contain the histograms of three groups, as defined by the values of the column 'cluster' in the dataframe.

The data look like this:

  TOTAL_Estimated_Collateral_value_sum cluster
1                           -0.17499342       1
2                           -0.86443362       1
3                            0.22211949       2
4                            0.01007717       1
5                           -0.77617685       2
6                           -1.43518056       1
7                           -0.19705983       1
8                           -0.39170108       1
9                           -0.94073376       1
10                           1.20525601       2

 TOTAL_Estimated_Collateral_value_sum    cluster     
 Min.   :-1.7697                      Min.   :1.000  
 1st Qu.:-0.7626                      1st Qu.:1.000  
 Median :-0.1322                      Median :1.000  
 Mean   : 0.0000                      Mean   :1.329  
 3rd Qu.: 0.8459                      3rd Qu.:2.000  
 Max.   : 1.8782                      Max.   :3.000  
> table(df_all$cluster)

    1     2     3 
24342  8565  1350

The code I am using is the following:

ggplot(df_all, aes(x=TOTAL_Estimated_Collateral_value_sum, color=cluster)) +
  geom_histogram(alpha = 0.7, position="dodge")

The image I get is the following:

histogram

As you can see, the observations are not coloured by the value of the cluster as I would expect.

Could you please explain to me why this is the case and what I should do to fix my code and get the expected output?

ekad
  • 14,436
  • 26
  • 44
  • 46
ak7
  • 175
  • 1
  • 4
  • 8

1 Answers1

11

You need to map cluster to fill, not color, and cluster needs to be a factor if it isn't already. So try:

ggplot(df_all, aes(x=TOTAL_Estimated_Collateral_value_sum, fill=cluster)) +
  geom_histogram(alpha = 0.7, position="dodge")

Or, if cluster isn't a factor:

ggplot(df_all, aes(x=TOTAL_Estimated_Collateral_value_sum, fill=as.factor(cluster))) +
  geom_histogram(alpha = 0.7, position="dodge")
ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • Thank you. It worked. By the way, my dataframe has many columns --80+ -- and I must do these histograms for each column. Is there a way to automate this process, perhaps using a loop? – ak7 Mar 19 '17 at 12:53
  • Yes, you could use `lapply` or `apply` to iterate over the columns, depending on what kind of output you want. But that's a whole other kettle of fish. – ulfelder Mar 19 '17 at 12:57
  • Update to comment: if you need to iterate the plotting process and want to stay in the `tidyverse`, you could use `purrr::walk` to do that. https://purrr.tidyverse.org/reference/map.html – ulfelder Jul 19 '20 at 10:58