0

I tried making the following histogram in R (randomly select 10% of all rows and color them red):

a = rnorm(100000,60000,1000)
b = a

c = data.frame(a,b)
color <- c("black", "red")     
color_1 <- sample(color, nrow(c), replace=TRUE, prob=c(0.9, 0.1))
c$color_1 = as.factor(color_1)


hist(c$a, col = c$color_1, 100000, main = "title")

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(
      sub = "some title")

Problem: But for some reason, the colors are not showing up:

enter image description here

I tried to see if other commands might get the colors to show up:

hist(c$a, col = color_1, 100000, main = "title")

Or trying to remove the color variable as a "factor":

a = rnorm(100000,60000,1000)
b = a

c = data.frame(a,b)
color <- c("black", "red")     
color_1 <- sample(color, nrow(c), replace=TRUE, prob=c(0.9, 0.1))
c$color_1 = color_1


hist(c$a, col = c$color_1, 100000, main = "title")

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(
      sub = "some title")

I also tried to follow the advice from this question here (Partially color histogram in R) :

h = hist(c$a, col = c$color_1, breaks = 100000, main = "title")

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(
      sub = "some title")



cuts <- cut(h$breaks, c(-Inf,Inf))
plot(h, col=cuts)

But this also did not work. I think this might be because I am not using the "cut" function correctly?

Can someone please show me how to fix this?

Thanks

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • If you are selecting histogram bars at random you don't have 1)a cut point; 2) What is `b` meant for? It's not used in the rest of the code; 3) Do you really want 100K bars for 100K data points? – Rui Barradas Oct 10 '21 at 05:42

1 Answers1

1

Here is what I understand of the question:

  1. Plot a vector's histogram;
  2. 10% of the bars are randomly selected;
  3. And have a different color.

First remake the example data set. Apparently, there is no need for a 2nd vector b. And the RNG seed is set, in order to make the results reproducible.

set.seed(2021)
a <- rnorm(100000, 60000, 1000)
c <- data.frame(a)
color <- c("black", "red")     
n_colors <- length(color)

Now get the histogram data but don't plot it. Then select as many color codes (at most n_colors) as counts. And plot the histogram.

h <- hist(c$a, breaks = "FD", plot = FALSE)
i_col <- sample(n_colors, length(h$counts), replace = TRUE, prob = c(0.6, 0.4))
plot(h, main = "title", col = color[i_col])

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(sub = "some title")

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you for your answer! I was able to figure it out another way - would you like to see my answer? (after a few minutes, i can accept your answer) – stats_noob Oct 10 '21 at 05:52
  • (can you take a look at this question if you have time later: https://stackoverflow.com/questions/69512664/creating-gif-animations-in-r thank you!) – stats_noob Oct 10 '21 at 05:56