Plot binary frequencies across multiple levels in R

Question

I have a dataset comprising participants binary answers to certain questions. This questions can have 3 different base conditions, and one 0/1 variation; that is, questions can be designated as 1.0, 1.1, 2.0,... and 3.1. My dataset holds each answer in a different row, including a column for the base condition and one for the modifier (plus an interaction column determining the combinations; see the example below).

What I would like to plot are the proportions of answers for each question, preferentially grouped by basic levels: i.e. three 2-bars groups showing the frequency of a certain outcome.

Here's a reproducible example dataset to work on, where Base_con, Var, and Dec represent the base condition, the variation, and the decision (answer), respectively:

# load example dataset with relevant columns
require(RCurl)
my_csv = getURL(
  "https://docs.google.com/spreadsheets/d/1x9PUZwPGmye6QDk7_4M_HslrmbgEC3DZ-v-VMvFkE6U/pub?output=csv")
df1 = read.csv(textConnection(my_csv))
# set columns as factors because they are numerically coded
df1$Base_con = as.factor(df1$Base_con)
df1$Var = as.factor(df1$Var)
df1$Dec = as.factor(df1$Dec)
df1$Int = interaction(df1$Base_con, df1$Var)

I have seen that the cdplot function does something very close to what I am looking for, but only accepts one continuous independent variable. I hope someone can help with this, it does not look as something very difficult to do, but I haven't found an answer here or elsewhere. I know I could build the graph in other software but I would prefer to learn to do it in R, and moreover it would help me to check the data along with the statistical analysis.

@Hack-R, sorry. I am editing the post with the info. `Var` is the code for the variation and `Dec` is the decision or answer. — Lea_Casiraghi, Aug 23 '16 at 18:09
Now we're talking :). I'd like a single plot showing all levels of `Base_con` (or, as you propose, three plots, one for each), with the values for `Var` in side-by-side columns. — Lea_Casiraghi, Aug 23 '16 at 18:51
Great, thanks for the clarification. I updated my answer just now, please let me know how close it is to what you need. — Hack-R, Aug 23 '16 at 19:03

Hack-R · Accepted Answer · 2016-08-23T21:24:19.653

2

for(i in unique(df1$Base_con)){
  barplot(c(table(df1$Dec[df1$Base_con == i & df1$Var == 1]),
           table(df1$Dec[df1$Base_con == i & df1$Var == 0])),
          main = paste("Your title goes here" , i),
          xlab = "Your label goes here")
}

Example plot for Base_con == 2:

edited Aug 23 '16 at 21:24

answered Aug 23 '16 at 18:13

Hack-R

22,422
14
75
131

@Lea_Casiraghi I changed it to a `barplot` and changed the plotted variable to `Var`, relabeling the title and X-axis. Use the factor version of `Var` with this version of the answer. – Hack-R Aug 23 '16 at 19:06
Thanks. This looks pretty much like what I am looking for, but I think there is something wrong. The frequencies shown seem to represent the occurrence of `Var` values within `Base_con`, and not the frequency of `Dec` 1 (or 0, it's the same) according to `Var` levels within each `Base_con` level. Am i right? Just as a guide, `Dec` should show mainly 0s for `Base_con == 3` (for any `Var` value), and a mainly 1s for `Base_con == 1`. – Lea_Casiraghi Aug 23 '16 at 19:12
@Lea_Casiraghi Could you run this code and tell me what results you get to help me figure out the problem? `table(df1$Var[df1$Base_con==1]);table(df1$Var[df1$Base_con==3])` – Hack-R Aug 23 '16 at 19:20
Of course: `> table(df1$Var[df1$Base_con==1]);table(df1$Var[df1$Base_con==3])` `0 1` `54 53` `0 1` `48 52` – Lea_Casiraghi Aug 23 '16 at 19:25
@Lea_Casiraghi Thanks, so it looks like the 0's and 1's should be about equal based on that output right? Or was there another transformation to the data? – Hack-R Aug 23 '16 at 20:18
0s and 1s are pretty balanced for `Var` so to have a balanced experiment. `Dec` values (i.e. participants' answers, what I want to plot) should be very different depending on `Base_con` (lots of 1 for `Base_con == 1` and few for `Base_con == 3`). The `Var` factor is expected to have an effect on `Dec` within each `Base_con` level. See: `table(df1$Dec[df1$Base_con==1]);table(df1$Dec[df1$Base_con==3])` `0 1` `20 87` `0 1` `94 6`. – Lea_Casiraghi Aug 23 '16 at 20:28
@Lea_Casiraghi Oh, ok got it! I actually started out with `Dec` but your 2nd comment under the question confused me and I thought you wanted `Var` instead of `Dec`. Just a second and I will update it. – Hack-R Aug 23 '16 at 20:35
Almost there! Now, we need to include the effect of `Var`into the plots. Right now we have two bars (one for `Dec == 0` and one for `Dec == 1`; although we actually only need one) per each `Base_con` level. The next step would be to get four bars for each `Base_con` level, two for `Var == 0` and two for `Var == 1` (although, again, we only need one per `Var` level, as one `Dec` value frequency is enough to determine the remaining proportion). Thanks a lot for helping! – Lea_Casiraghi Aug 23 '16 at 20:46
@Lea_Casiraghi Oh, ok I think I understand now. I just updated my answer again, please have a look. – Hack-R Aug 23 '16 at 21:24
1

Yes, it's pretty much what I was looking for! The variations I'd try to suggest would only be accesory and I believe I can try to include them myself: 1) plot % or frequencies instead of total cases (each pair of bars for every `Var` level should add up to 100%), and 2) remove either the `Dec == 1` or 0 bar, as with % you only need to look at one. Thanks a lot! This has been a great experience. – Lea_Casiraghi Aug 24 '16 at 10:49
1

@Lea_Casiraghi Great, glad it helped. To get the % instead of frequency we just wrap the `table()` statement in `prop.table`. You may have seen me do this in several iterations (it's visible in the edit history if you click that link). I will try to come back to this and update this with the other things you requested later today when I get a few free minutes. – Hack-R Aug 24 '16 at 13:11

Plot binary frequencies across multiple levels in R

1 Answers1