2

I ^ have a dataset that looks something like this:

    typestudy   dloop cytb coi  other microsat  SNP
    methods     no  no  no  no  yes no
    methods     yes no  no  no  no  yes
    methods     no  no  no  no  yes no
    methods     no  no  no  no  yes no
    wildcrime   no  no  no  yes no  no
    taxonomy    no  no  no  no  yes no
    methods     yes no  no  no  no  no
    methods     no  no  no  no  yes no
    taxonomy    no  no  no  no  yes no
    wildcrime   yes no  no  no  no  no
    methods     yes no  no  no  no  no
    taxonomy    no  no  no  no  yes yes
    taxonomy    no  no  no  no  yes no

Except it has 10 columns of yes/no corresponding to further genetic elements and there are over 200 rows.

In Excel the graphical summary option give me a great stacked bar plot but I need to be able to recreate it in R to meet university standards for my report

    > summary(dframe1$type.of.study)
         methods development                        other 
                          49                            5 
population genetic structure                     taxonomy 
                          91                           86 
              wildlife crime 
                           6 
    > barplot(as.matrix(dframe1))
     There were 11 warnings (use warnings() to see them)
    > warnings()
    Warning messages:
    1: In apply(height, 2L, cumsum) : NAs introduced by coercion
    2: In apply(height, 2L, cumsum) : NAs introduced by coercion
    3: In apply(height, 2L, cumsum) : NAs introduced by coercion
    4: In apply(height, 2L, cumsum) : NAs introduced by coercion
    5: In apply(height, 2L, cumsum) : NAs introduced by coercion
    6: In apply(height, 2L, cumsum) : NAs introduced by coercion
    7: In apply(height, 2L, cumsum) : NAs introduced by coercion
    8: In apply(height, 2L, cumsum) : NAs introduced by coercion
    9: In apply(height, 2L, cumsum) : NAs introduced by coercion
   10: In apply(height, 2L, cumsum) : NAs introduced by coercion
   11: In apply(height, 2L, cumsum) : NAs introduced by coercion

which gives me this

enter image description here

and I've also managed to produce this but can't find the script I used for it

enter image description here

My aim is something similar to this:

it's pretty pathetic but it's taken me almost a week of troubleshooting based on online resources and other questions on here to get to this point. I can't figure out how to count the types of study so they're tallied up to make the height of the bar plots corresponding to different genetic markers. I know this is far too vague for stackoverflow's standards but I'm desperate so I'm leaving this up in case anyone has any suggestions


^ (I'm going to be as concise as I can but as you'll see my fluency in R is atrocious, I wouldn't ask for help but I've spent DAYS grappling with this data and I'm petrified I'll never find a solution and there's nobody to ask for help on my research placement)

zx8754
  • 52,746
  • 12
  • 114
  • 209
zoonia
  • 21
  • 3
  • Do you need `barplot(sapply(df1[-1], function(x) table(factor(x, levels = c("yes", "no")))))` – akrun Apr 27 '18 at 11:14
  • @akrun you beautiful soul thank you so much for your time! I don't want to ask too much and you've helped me massively already but I'm wondering how to make the bar plots proportional to each other rather than as a percentage? – zoonia Apr 27 '18 at 11:18
  • Is that what you wanted or different? – akrun Apr 27 '18 at 11:19
  • @akrun oh wow I was so excited but I've realised this still leaves me with some questions about how to incorporate the studytype column into the stacks to make them look like the excel barplot – zoonia Apr 27 '18 at 11:27
  • Are you just counting the number of `yes`es? – hpesoj626 Apr 27 '18 at 11:39

3 Answers3

1

We can get the table of each column by looping over it and then do a barplot

barplot(sapply(df1[-1], function(x) table(factor(x, 
     levels = c("yes", "no")))), col = c("red", "blue"))
legend("topright", legend = c("yes", "no"), fill = c("red", "blue"))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Just in case you want something that looks like the example (kind of):

df <- read.table(text = "typestudy   dloop cytb coi  other microsat  SNP
    methods     no  no  no  no  yes no
                 methods     yes no  no  no  no  yes
                 methods     no  no  no  no  yes no
                 methods     no  no  no  no  yes no
                 wildcrime   no  no  no  yes no  no
                 taxonomy    no  no  no  no  yes no
                 methods     yes no  no  no  no  no
                 methods     no  no  no  no  yes no
                 taxonomy    no  no  no  no  yes no
                 wildcrime   yes no  no  no  no  no
                 methods     yes no  no  no  no  no
                 taxonomy    no  no  no  no  yes yes
                 taxonomy    no  no  no  no  yes no", 
                 header = T, stringsAsFactors = F)

library(tidyr)
library(ggplot2)
library(dplyr)
df %>% gather(key = key, value = value, -typestudy) %>% 
  filter(value == "yes") %>% 
  ggplot(aes(x = key, fill = typestudy)) +
  geom_bar() + 
  coord_flip() + 
  theme_minimal() +
  theme(legend.position = "bottom",
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank()) +
  xlab(NULL) +
  ylab(NULL)
Tino
  • 2,091
  • 13
  • 15
-1

I don't know if you are just after the yeses but here is the possibility that may enable you to use the no's just in case you want barplots by types of response (yes/no).

df %>%
  gather(var, value, -typestudy) %>%
  group_by(typestudy, var, value) %>%
  count() %>%
  filter(value == "yes") %>%
  ggplot(aes(var, n, group = typestudy, fill = typestudy)) +
  geom_bar(stat = "identity") +
  scale_fill_brewer(palette = "Dark2", direction = -1) +
  coord_flip() +
  theme(
    axis.title.x=element_blank(),
    axis.title.y=element_blank(),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.title=element_blank())

enter image description here

Data

df <- structure(list(typestudy = c("methods", "methods", "methods", 
"methods", "wildcrime", "taxonomy", "methods", "methods", "taxonomy", 
"wildcrime", "methods", "taxonomy", "taxonomy"), dloop = c("no", 
"yes", "no", "no", "no", "no", "yes", "no", "no", "yes", "yes", 
"no", "no"), cytb = c("no", "no", "no", "no", "no", "no", "no", 
"no", "no", "no", "no", "no", "no"), coi = c("no", "no", "no", 
"no", "no", "no", "no", "no", "no", "no", "no", "no", "no"), 
    other = c("no", "no", "no", "no", "yes", "no", "no", "no", 
    "no", "no", "no", "no", "no"), microsat = c("yes", "no", 
    "yes", "yes", "no", "yes", "no", "yes", "yes", "no", "no", 
    "yes", "yes"), SNP = c("no", "yes", "no", "no", "no", "no", 
    "no", "no", "no", "no", "no", "yes", "no")), .Names = c("typestudy", 
"dloop", "cytb", "coi", "other", "microsat", "SNP"), class = "data.frame", row.names = c(NA, 
-13L))
hpesoj626
  • 3,529
  • 1
  • 17
  • 25
  • you absolute dream of a human being thank you so much!! this is wonderful – zoonia Apr 27 '18 at 12:32
  • its a shame that my vote on your answer doesn't show up because I'm new to the site, you well and truly saved my bacon I'm so grateful – zoonia Apr 27 '18 at 15:37