0

I am currently working on a script that will take in Windows Perfmon Data, and plot graphs from this data, as I have found the PAL tool far too slow.

This is my first pass and is quite basic at the moment.

I am struggling with the scaling of the y axis. I am currently getting horrible graphs like this: R Data Plot - Perfmon

How can I scale the Y axis appropriately so that there are reasonable breaks etc with data between 0 and 1. (e.g 0.0000123,0.12,0.98,0.00000024) etc?

I was hoping for something dynamic like:

scale_y_continuous(breaks = c(min(d[,i]), 0, max(d[,i])))

Error in Summary.factor(c(1L, 105L, 181L, 125L, 699L, 55L, 270L, 226L,  : 
min not meaningful for factors

Any help appreciated.

require(lattice)
require(ggplot2)
require(reshape2)

# Read in Perfmon -- MUST BE CSV
d <- read.table("~/R/RPerfmon.csv",header=TRUE,sep=",",dec=".",check.names=FALSE)
# Rename First Column to Time as this is standard in all Perfmon CSVs
colnames(d)[1]="Time"
# Convert Time Column into proper format
d$Time<-as.POSIXct(d$Time, format='%m/%d/%Y %H:%M:%S')
# Strip out The computer name from all Column Headers (Perfmon Counters)
# The regex matches a-zA-Z, underscores and dashes, may need to be expanded
colnames(d) <- sub("^\\\\\\\\[a-zA-Z_-]*\\\\", "", colnames(d))
colnames(d) <- sub("\\\\", "|", colnames(d))
colnames(d)
warnings()

pdf(paste("PerfmonPlot_",Sys.Date(),".pdf",sep=""))
for (i in 2:ncol(d)) {

  p <- qplot(d[,"Time"],y=d[,i], data=d, xlab="Time",ylab="", main=colnames(d[i]))
  p <- p + geom_hline()
  p <- p + scale_y_continuous(breaks = c(min(d[,i]), 0, max(d[,i])))
  print(p)

}
dev.off()
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • The data you are trying to plot as the y-variable are factors. You have to convert them with `as.numeric` first. – Jaap May 30 '14 at 19:34

2 Answers2

1

In order to get reasonable breaks between 0 and 1, you can for example use:

scale_y_continuous(breaks=c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0))

A rewritten plot-part of your code:

ggplot(d, aes(x=Time, y=d[,i])) +
  geom_hline() +
  scale_y_continuous(breaks=c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0)) +
  labs(title=colnames(d[i]), x="Time",y="")

And a more dynamic way of setting the breaks:

scale_y_continuous(breaks=seq(from=round(min(d[,i]),1), to=round(max(d[,i]),1), by=0.1))

However, when you look at the error message, you can see that the y-variables are factor-variables. So you have to convert them with as.numeric first.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • I was hoping for something dynamic like perhaps: scale_y_continuous(breaks = c(min(d[,i]), 0, max(d[,i]))) However I get Error in Summary.factor(c(1L, 105L, 181L, 125L, 699L, 55L, 270L, 226L, : min not meaningful for factors – ASPNETMVC-Newbie May 30 '14 at 19:22
0

Here is the code I ended up with after a bit of playing in case anyone wants to be able to do the same:

The key to making it dynamic was the following (note the as.numeric to avoid any errors)

 ynumeric <- as.numeric(d[,i])
 ymin <- min(ynumeric,na.rm = TRUE)
 ymax <- max(ynumeric,na.rm = TRUE)

 #generate sequence of 10
 ybreaks <- seq(ymin, ymax, length.out = 10)

 #Then passing this to the y_continuous function
 p <- p + scale_y_continuous(breaks=c(ybreaks))

I hope to expand this in the future to be somewhere in the region of PALs complexity, but using R for efficiency.

require(lattice)
require(ggplot2)
require(reshape2)

# Read in Perfmon -- MUST BE CSV
d <- read.table("~/R/RPerfmon.csv",header=TRUE,sep=",",dec=".",check.names=FALSE,stringsAsFactors=FALSE)
# Rename First Column to Time as this is standard in all Perfmon CSVs
colnames(d)[1]="Time"
# Convert Time Column into proper format
d$Time<-as.POSIXct(d$Time, format='%m/%d/%Y %H:%M:%S')
# Strip out The computer name from all Column Headers (Perfmon Counters)
# The regex matches a-zA-Z, underscores and dashes, may need to be expanded
colnames(d) <- sub("^\\\\\\\\[a-zA-Z_-]*\\\\", "", colnames(d))
colnames(d) <- sub("\\\\", "|", colnames(d))
colnames(d)
warnings()

pdf(paste("PerfmonPlotData_",Sys.Date(),".pdf",sep=""))

for (i in 2:ncol(d)) {

  ynumeric <- as.numeric(d[,i])
  ymin <- min(ynumeric,na.rm = TRUE)
  ymax <- max(ynumeric,na.rm = TRUE)

  #generate sequence of 10
  ybreaks <- seq(ymin, ymax, length.out = 10)
  print(ybreaks)

  print(paste(ymin,ymax))

  p <- qplot(d[,"Time"],y=ynumeric, data=d, xlab="Time",ylab="", main=colnames(d[i]))
  p <- p + geom_smooth(size=3,se=TRUE) + theme_bw()
  p <- p + scale_y_continuous(breaks=c(ybreaks))
  print(p)

}
dev.off()