1

I'm trying to create a set of charts for reports in R using ggplot2 but I'm still struggling to get used to ggplot2.

The chart data is coming from SQL tables that are laid out like this:

|  Name | Version | Category | Value |  Date  | Number |   Build   | Error |
|:-----:|:-------:|:--------:|:-----:|:------:|:------:|:---------:|:-----:|
| File1 | 0.01    | Time     | 123   | 1-1-12 | 1      | Iteration | None  |
| File1 | 0.01    | Size     | 456   | 1-1-12 | 1      | Iteration | None  |
| File1 | 0.01    | Final    | 789   | 1-1-12 | 1      | Iteration | None  |
| File2 | 0.01    | Time     | 312   | 1-1-12 | 1      | Iteration | None  |
| File2 | 0.01    | Size     | 645   | 1-1-12 | 1      | Iteration | None  |
| File2 | 0.01    | Final    | 978   | 1-1-12 | 1      | Iteration | None  |
| File3 | 0.01    | Time     | 741   | 1-1-12 | 1      | Iteration | None  |
| File3 | 0.01    | Size     | 852   | 1-1-12 | 1      | Iteration | None  |
| File3 | 0.01    | Final    | 963   | 1-1-12 | 1      | Iteration | None  |
| File1 | 0.02    | Time     | 369   | 1-1-12 | 2      | Iteration | None  |
| File1 | 0.02    | Size     | 258   | 1-1-12 | 2      | Iteration | None  |
| File1 | 0.02    | Final    | 147   | 1-1-12 | 2      | Iteration | None  |
| File2 | 0.02    | Time     | 753   | 1-1-12 | 2      | Iteration | None  |
| File2 | 0.02    | Size     | 498   | 1-1-12 | 2      | Iteration | None  |

This table is data from running various files and recording output. The name column is the file name, the version column is the software version, the category column is the kicker here and is the different types of data that is recorded, value column is the value of the data, the date column is the date the data was recorded, the number was the number of "runs" where this data was recorded, the build column is whether the software is a development build or a release, and the error column is for recording errors during runs.

The category column is what I want to look at. I want to generate a series of charts in ggplot2 that chart the value (y) (one chart for each category for each file) vs the version number (x). So the y axis will show the value of data for a specific category and the x-axis shows the build version. here's the code I've got so far.

dbhandle <- SQLConn_remote(DBName = "DBName", ServerName = "SERVER")
Tabledf<-sqlQuery(dbhandle, 'select * from ExampleTable', stringsAsFactors = FALSE)

Tabledf$Name<-str_trim(Tabledf$Name)     
Tabledf$Version<-str_trim(Tabledf$Version)
Tabledf$Category<-str_trim(Tabledf$Category)
Tabledf$Value<-as.numeric(Tabledf$Value)

scenarios<-unique(Tabledf$Name)

for (i in 1:3){
p <- ggplot(subset(Tabledf, Name == scenarios[i]), aes(x=Number, y=Value, group=Name)) 
p <- p + geom_line(aes(color=Date)) + 
         geom_point(size = 1.2, shape = 19, colour = 'red') + 
         facet_grid(Category ~ ., scales  = "free", space = "free")
print(p)
}

I would like to know how I could add a label for which file is showing up and this is only generating 1 set of charts for one file. I'm not sure whats wrong. I'm guessing that my loop isn't set up correctly. I'm still sort of new to R coding and this was pulled from an example. I've got upwards of 30ish files that I want to have charts displayed for. Ideally, it would have pages where you'd have 1 group of 5-8 charts that are faceted for each Category all for the first file. Then you'd have the same thing but for the 2nd file and so on and so forth.

EDIT: Even if someone could simply point me in the direction of resources to figure this out on my own I would be grateful. I've been looking and I'm only finding resources that are lengthy on qplot.

EDIT2: Okay so I'm feeling pretty dumb and can't figure out how to create a dataset to upload using dput but here's the code I've used to create a usable dataframe.

rw1 <- c("File1", "File1", "File1", "File1", "File1", "File1", "File1", "File1", "File1", "File2", "File2", "File2", "File2", "File2", "File2", "File2", "File2", "File2")
rw2 <- c("0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02")
rw3 <- c("Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final")
rw4 <- c(123, 456, 789, 312, 645, 978, 741, 852, 963, 369, 258, 147, 753, 498, 951, 753, 915, 438)
rw5 <- c("01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12")
rw6 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2)
rw7 <- c("Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration")
rw8 <- c("None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None")


df = data.frame(rw1, rw2, rw3, rw4, rw5, rw6, rw7, rw8)
colnames(df) <- c("Name", "Version", "Category", "Value", "Date", "Number", "Build", "Error")

That will generate a chart that is nearly identical to the chart shown above. Ideally I'd like to create a set of charts from this data set. The format would be a group or set of charts grouped together by Filename. So for example:

Example File1 Charts

Might be the group of charts only for File1. Each of the chart facets would be a single "Category" from the dataframe/table. There would be another group like this for File2 and so on and so forth.

I'd like to clean this up though. There are category labels on the right sides of each facet I've just elected to remove those in the picture. However, I'd like to add a label that shows "Name" as well as "Category" and "Value". The code I've provided above also only generates these for the first file. Well I believe its the first file. Without any Name label, there's no way of knowing.

JohnN
  • 968
  • 4
  • 13
  • 35
  • 2
    You can easily turn this question into reproducible by removing all that's sql-related and adding `dput(Tabledf)`. – tonytonov Jun 26 '15 at 13:07
  • What is the expected output? , and also please provide a reproducible example – Matias Andina Jun 26 '15 at 15:05
  • I just added more information in an edit to the last paragraph explaining the expected output more thoroughly. I'm not sure how I could create a reproducible example. Its fundamentally dependant on how the data is set up in my sql table. Any suggestions for how I could are welcome. – JohnN Jun 26 '15 at 15:07
  • Thanks! I will work on creating a reproducible example. – JohnN Jun 26 '15 at 16:22
  • I've created a dataframe that contains all the data in the table above. I didn't quite follow how I could use dput to make that into something you guys could work with though. – JohnN Jun 26 '15 at 20:38
  • I'm getting three plots when I run your code with the example dataset, one plot for each `Name`. To get a title for each file, you can add `+ ggtitle(scenarios[i])` to the end of your graphic code. Rather than printing each plot, consider saving them to a list. Examining the list might give you a better idea of which graphic you are getting out of your loop (if you're only getting one). – aosmith Jun 29 '15 at 15:34
  • Noob question: How do I save to a list? – JohnN Jun 29 '15 at 16:11
  • Initialize a named list object outside of the loop, e.g., `list1 = list()`. Assign each plot to the *ith* list element inside your loop, `list1[[i]] = p`, replacing the `print(p)` line. After running the loop you can extract each element/plot separately to examine it and see if they're all the same. For example, the first plot will be `list1[[1]]`. – aosmith Jun 29 '15 at 20:28
  • when I do that I get: `Error in list[[i]] : object of type 'builtin' is not subsettable.` – JohnN Jun 30 '15 at 15:29
  • @aosmith Should I have `list[[i]]` after I have `for (i in 1:3)`? – JohnN Jun 30 '15 at 17:06
  • [This question](http://stackoverflow.com/questions/28596979/creating-a-list-of-ggplots-in-a-loop-using-the-index-of-the-loop-as-an-argument) has an example of using a list in this kind of scenario. – aosmith Jun 30 '15 at 17:31
  • @aosmith I tried formatting exactly as that question did and I still only get what appears to be 3 facets for each Category containing data points for all Files. In this case 1 and 2. – JohnN Jun 30 '15 at 19:50

0 Answers0