3

I have a huge collection of data with date, client and its NFS usage. I'm using lattice R package for plotting, as adviced on superuser. Also, Stackoverflow helped me on converting the date string to an actual date object.

Now, my code is this:

require(lattice)

logfile <- read.table(file="nfsclients-2d.log")
names(logfile) <- c("Date","Client","Operations")

allcol <- c("blue","chocolate4","cornflowerblue","chartreuse4","brown3","darkorange3","darkorchid3","red","deeppink4","lightsalmon3","yellow","mistyrose4","seagreen3","green","violet","palegreen4","grey","slateblue3","tomato2","darkgoldenrod2","chartreuse","orange","black","yellowgreen","slategray3","navy","firebrick1","darkslategray3","bisque3","goldenrod4","antiquewhite2","coral","blue4","cyan4","darkred","orangered","purple4","royalblue4","salmon")
col=allcol[0:length(levels(logfile$Client))]

svg(filename="/tmp/nfsclients-2d.svg",width=14,height=7)

times <- as.POSIXct(strptime(levels(logfile$Date), format="%m/%d-%H:%M"))
logfile$Date <- times[logfile$Date]
xyplot(Operations~Date,group=Client,data=logfile,jitter.x=T,jitter.y=T,
 aspect = 0.5, type = "l",
 par.settings=list(superpose.line=list(col=col,lwd=3)),
 xlab="Time", ylab="Operations", main="NFS Operations (last 2 days, only clients with >40 operations/sec)",
 key=list( text=list(levels(logfile$Client)), space='right',
           lines=list(col=col),columns=1,lwd=3,cex=0.75))

dev.off()

And the output file is this (stripped out the legend):

enter image description here

The X axis values are not very useful here: "tue" "tue" "wed" "wed". It seems that it only takes the first significative value as label. Some more labels (maybe 6 or 7) would be more useful also.

When plotting 2 weeks it's even worse. Only 2 values are displayed on the X axis: "2012" "2013". Not even repeated, only 2 values!

The data I'm plotting.

Community
  • 1
  • 1
Jorge Suárez de Lis
  • 565
  • 1
  • 10
  • 29
  • This isn't an answer to your question, but an alternative approach still within R might be ggplot2 (you need to install it from CRAN first), which gives similar functionality to lattice but is based on a consistent "grammar of graphics". For your purposes you might prefer the defaults. The basic command for you would be something like ggplot(logfile, aes(x=Date, y=Operations, color=Client)) + geom_line(). You can add scale_color_manual() commands to use your preferred colours. – Peter Ellis Jan 09 '13 at 18:43

2 Answers2

4

This is not a direct answer to your lattice question , but really I would use scales package here with a ggplot2. You can foramt your axis as you like.

p <- ggplot(dat = logfile, aes(x= Date,
                          y =Operations, 
                          group = Client,
                          color = Client ))+geom_line()

You give us just 2 days data, so I break my data in 10 hours to present the idea

library(scales) # to access breaks/formatting functions
p %+% scale_x_datetime(breaks = date_breaks("10 hour"), 
                    minor_breaks = date_breaks("2 hour"))

enter image description here

agstudy
  • 119,832
  • 17
  • 199
  • 261
  • Thanks! I like this approach, very simple! Gives a reasonable plot without too much pain. I'll explore this when I find the time. For now, I'm sticking for the already-working lattice approach. – Jorge Suárez de Lis Jan 09 '13 at 21:36
3

You will need to construct a proper interval for this axis. If this is really the prior two days then perhaps something like:

  interval <- as.POSIXct( Sys.Date() - c(1,3) )

Then you need to construct a scales argument for the x-axis:

 xyplot(Operations~Date,group=Client,data=logfile,jitter.x=T,jitter.y=T,
         aspect = 0.5, type = "l",
         scales=list(x=list(at= .......  , 
                     labels=format( ......, "%H:%M") ),
          #rest of code
         )

What you put in for the ..... value will be something along the lines of:

   seq( interval[2], interval[1], by="4 hour")

This is what is returned from a format.POSIXt call:

> format( seq( interval[2], interval[1], by="4 hour") , "%H:%M")
[1] "16:00" "20:00" "00:00" "04:00" "08:00" "12:00" "16:00" "20:00" "00:00" "04:00" "08:00" "12:00"
[13] "16:00"
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks, but that doesn't seem to work. `at` parameter should receive a list of dates, not a list of string. Passing a list of strings actually shouts an error. And converting these strings to dates again solves the problem only partially: I can control now the number of _ticks_, but not the date display format. – Jorge Suárez de Lis Jan 09 '13 at 20:17
  • Right. I said I was thinking you would pass `seq(interval[2], interval[1], by="4 hour")` as 'at` and then give the same thing within a format call as 'labels'. I showed you how to use `format`. – IRTFM Jan 09 '13 at 20:20
  • Oh, right. I didn't understand it at first. Took a while :) but everything works now as expected. Maybe should I add the final code to your answer to make it more clear for others? Thank you :) – Jorge Suárez de Lis Jan 09 '13 at 21:32
  • I doubt you have enough rep to edit my answer but if you post it in a comment, I can edit my own. (Or you can edit your own question.) – IRTFM Jan 09 '13 at 21:35