ggplot geom_col: automatically defining y from data?

Question

I have a dataframe that looks like that:

A           B           C
0,868385346 0,628248588 0,468926554
0,074626866 0,277966102 0,271186441
0,024423338 0,057627119 0,203389831
0,017639077 0,007909605 0,011299435
0,004070556 0,007909605 0,011299435
0,004070556 0,005649718 0,011299435
0,002713704 0,003389831 0,005649718
0,001356852 0,001129944 0,005649718
0,001356852 0,001129944 0,005649718
0,001356852 0,001129944 0,005649718
            0,001129944 
            0,001129944 
            0,001129944 
            0,001129944 
            0,001129944 
            0,001129944 
            0,001129944

These are proportions of compositions of A, B and C (the numbers add to 1, with the highest figure at the top)

I want to make a bar chart with A, B, C on the x axis (or faceted but I'll see that later), and for each, a bar that shows the actual data (so for A, ten bars showing the proportions, the first being 0.86, the second 0.07, etc.) in order to compare the different distribution within the composition.

ggplot documentation states: "If you want the heights of the bars to represent values in the data, use geom_col instead" which is exactly what I want.

I run the following with na.omit since the different columns have a different number of rows

ggplot(na.omit(data)) + geom_col()

I get the following error: Error in pmin(y, 0) : object 'y' not found

I see that I have to assign a y (in the geom_bar documentation since it seems geom_col has no documentation of its own). I tried various things to get a scale from 0 to 1, such as y=c(0:1), but nothing seems to be working.

I still don't understand how to assign a y axis while the function geom_col says it makes the height of the bar from the data...

I am obviously missing something basic here, so any pointers would be appreciated.

score 2 · Accepted Answer · answered Jun 17 '18 at 05:43

You have to convert your data from wide format to long format, such as dat2 in my example. You will also need to create an ID column. After that, you can use geom_col to plot the bar chart. In my code example below, I also show how to set the limit on the y axis and use the facet_grid.

library(tidyverse)

dat2 <- dat %>% 
  mutate(ID = 1:n()) %>%
  gather(Column, Value, -ID)

ggplot(dat2, aes(x = ID, y = Value)) +
  geom_col() +
  scale_y_continuous(limits = c(0, 1)) +
  facet_grid(Column ~ .) +
  theme_bw()

DATA

dat <- read.table(text = "A           B           C
0.868385346 0.628248588 0.468926554
0.074626866 0.277966102 0.271186441
0.024423338 0.057627119 0.203389831
0.017639077 0.007909605 0.011299435
0.004070556 0.007909605 0.011299435
0.004070556 0.005649718 0.011299435
0.002713704 0.003389831 0.005649718
0.001356852 0.001129944 0.005649718
0.001356852 0.001129944 0.005649718
0.001356852 0.001129944 0.005649718
NA          0.001129944 NA 
NA          0.001129944 NA
NA          0.001129944 NA
NA          0.001129944 NA
NA          0.001129944 NA
NA          0.001129944 NA
NA          0.001129944 NA"
                  , header = TRUE)

score 1 · Answer 2 · answered Jun 17 '18 at 05:33

I wrangled your data into tidy format, and then used geom_col(). I had to convert the y axis to a factor variable in order for the barplot to show the actual identity of the values. You could also use geom_bar(stat = "identity").

# double check that these values are correct, I wrote this quickly
A <- c(0.868385346
       ,0.07626866
       ,0.024423338
       ,0.017639077
       ,0.004070556
       ,0.004070556
       ,0.002713704
       ,0.001356852
       ,0.001356852
       ,0.001356852
       ,NA
       ,NA
       ,NA
       ,NA
       ,NA
       ,NA
       ,NA)


B <- c(0.628248588
       ,0.277966102
       ,0.057627119
       ,0.007909605
       ,0.007909605
       ,0.005649718
       ,0.003389831
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944
       ,0.001129944)


C <- c(0.468926554
       ,0.271186441
       ,0.203389831
       ,0.011299435
       ,0.011299435
       ,0.011299435
       ,0.005649718
       ,0.005649718
       ,0.005649718
       ,0.005649718
       ,NA
       ,NA
       ,NA
       ,NA
       ,NA
       ,NA
       ,NA)


# combine all three vectors into a dataframe
df_wide <- data.frame(A,B,C)

# convert to tidy format
df <- gather(df_wide, id, value) %>% na.omit()


# create our plot
ggplot(df, aes(x = as.factor(id), y = as.factor(value), fill = id)) + 
  geom_bar(position = "dodge", stat = "identity")

If we leave this column as a numeric, `geom_col()` will plot either the count of observations or their sum within each group by default. Converting them to a factor variable tells `geom_col` to plot them at face value instead. I like your solution better though. — philiporlando, Jun 17 '18 at 05:51
There definitely seems to be a time component to these data, so creating the sequential ID variable for the x axis was a smart idea. — philiporlando, Jun 17 '18 at 05:57
You got my upvote, but somehow I don't think this is what the OP wants. — www, Jun 17 '18 at 06:00
I'd agree. Your figure is much more intuitive and I learned something new! — philiporlando, Jun 17 '18 at 06:07
Thanks for the solution using a factor. This can work well for me as the facet grid from www (which is what I had asked so I selected his answer). FYI, these data have no time component, but were organized by composition in decreasing importance. — Gemini1096, Jun 17 '18 at 08:02

ggplot geom_col: automatically defining y from data?

2 Answers2