1

I am trying to achieve a histogram with two main properties, and have managed to generate each individually but have no idea how to combine the two methods to produce what I want.

I am aiming for a stacked histogram plotting Age (in 5 year bins) along the X axis, split into two stacked bars (Admission=1, Admission=0), displayed as a proportion/percentage of the bin admitted/not admitted.

dataset:

> dput(head(example_data))
structure(list(GAPS = c(26L, 16L, 21L, 15L, 17L, 13L), Age = c(62L, 
62L, 62L, 58L, 70L, 70L), Admitted = c(0L, 1L, 1L, 0L, 0L, 0L
)), row.names = c(NA, 6L), class = "data.frame")

I am working in R, but the dataset originates from a pandas dataframe and if an easier solution exists in python matplotlib etc, I am happy to use that instead.

So far, I can generate a bar chart for each age with the proportions on the Y axis as desired by using the below code:

myTable<-table(dataset$Admitted, dataset$Age)
myTable
myTable2<-prop.table(myTable, 2)

barplot(myTable2)
barplot(myTable2,legend=rownames(myTable2), xlab="Age", col=c(7, 4))

bar_proportions I can also easily create a simple binned histogram, by simply using

hist(dataset$Age)

My question is how I can go about adapting the barplot method to instead incorporate a histogram, as the result is too busy including every age possibility.

  • You need to provide data, please add output of `dput(dataset)` as an edit to your question. – jay.sf Nov 23 '19 at 14:35
  • It is an extremely large dataset that can't be displayed by that function, would it be acceptable to include head(dataset) instead perhaps? – purpleeggshells Nov 23 '19 at 14:39
  • Yes but `dput` is **important**, you could use `dput(head(dataset))`. You could make a smaller example though, no need to provide all the data. Read: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – jay.sf Nov 23 '19 at 14:48
  • 1
    Ok, I've modified the dataset so the important variables are included in dput(head(dataset)), I hope that's helpful – purpleeggshells Nov 23 '19 at 14:51

1 Answers1

0

You could create bins, e.g. every ten years. For this use the cut() function.

dat$age.bins <- as.numeric(as.character(cut(dat$age, breaks=(0:10)*10, labels=(1:10)*10)))
myTable3 <- with(dat, prop.table(table(adm, age.bins), 2))
barplot(myTable3, legend=rownames(myTable3), xlab="Age", col=c(7, 4))

Result

enter image description here

Data

set.seed(42)
dat <- data.frame(age=rnbinom(1e4, 5, mu=30),
                  adm=rbinom(1e4, 1, 2/6))
jay.sf
  • 60,139
  • 8
  • 53
  • 110