1

Before I get to my question, I should point out that I am new in R, and this question might be simplicity itself for an experienced user. I want to use ggplot2 to take full advantage of all the functionalities therein. However, I have encountered a problem that I have not been able to solve. If I have a data frame as follows:

df = as.data.frame(cbind(rnorm(100,35:65),rnorm(100,25:35),rnorm(100,15:20),rnorm(100,5:10),rnorm(100,0:5)))
header = c("A","B","C","D","E")
names(df) = make.names(header)

Plotting the data, where rows are Y and X is columns can readily be done in base R like e.g. this:

par(mfrow=c(2,0))
stripchart(df, vertical = TRUE, method = 'jitter')
boxplot(df)

The picture shows the stripchart & boxplot of the data

However, the same cannot readily be done in ggplot2, as x and y input are required. All examples I have found plots one column vs another column, or process the data into the column format. Yet, I want to set y as the rows in my df and the x as the columns. How can this be accomplished?

Rnewbie
  • 29
  • 1
  • 6
  • You're trying to make boxplots? – C-x C-c May 07 '18 at 18:36
  • I am aiming at making overlays of jitter+violin+box plots, but my data are generally in this shape, and I would like to be able to do this for any given type of applicable plots types. – Rnewbie May 07 '18 at 18:39

1 Answers1

4

You'll need to reshape your data in order to get those graphs. I think this is what you're looking for:

> library(ggplot2)
> library(reshape2) 
> df = as.data.frame(cbind(rnorm(100,35:65),rnorm(100,25:35),rnorm(100,15:20),rnorm(100,5:10),rnorm(100,0:5)))
> header = c("A","B","C","D","E")
> names(df) = make.names(header)
> df = melt(df)
 No id variables; using all as measure variables
> head(df)
  variable    value
1        A 36.75505
2        A 35.68714
3        A 36.44952
4        A 38.77236
5        A 39.79136
6        A 39.39672

> ggplot(df, aes(x = variable, y = value))
> ggplot(df, aes(x = variable, y = value)) + geom_boxplot()
> ggplot(df, aes(x = variable, y = value)) + geom_point(shape = 0, size = 20)

Here is the box plot: enter image description here

Here is the strip chart: enter image description here

You can change the settings in aes() options. See here for more info.

svenkatesh
  • 1,152
  • 2
  • 10
  • 25
  • Read this post for some insight on when to use long vs. wide (your data) data: https://stackoverflow.com/questions/34590173/long-and-wide-data-when-to-use-what?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – C-x C-c May 07 '18 at 18:38
  • I can see how that would work for small data sets (like the random one I included), however, this solution moves all data into one column, and adds a new column with the previous column names. When datasets are larger, say 100+ variables, and hundreds of data entries, it seems that it would be a lot harder to work with an approach like this... None withstanding, your solution work, so thank you for that. – Rnewbie May 07 '18 at 18:47
  • I've edited the post to include code to make the strip chart. If this solution works for you, please accept my answer. – svenkatesh May 07 '18 at 19:07