1

Hi R users and programmers, I have a data set consisting of 4563 amino acids of a protein. Using three different treatments and two different oxidants, amino acids in this protein were oxidized. I would like to plot the position of those oxidations in a chart based on the treatment. Different line size will represent varying oxidant concentration and line type (dashed and solid) will represent different types of oxidant. I would like to break the axis at each 1000 amino acid. I have created a similar template with excel and gimp (which is rather time consuming and possibly inappropriate!). 0.33 in the template is line height. http://dl.dropbox.com/u/58221687/Chakraborty_Figure1.png. Here is the dataset: http://dl.dropbox.com/u/58221687/AA-Position-template.xls

Thanks in advance. Sourav

  • 1
    It might be useful if you could provide a dummy dataset, including the oxidant concentration values you'd like to use to calculate line size (line thickness/width?), the oxidant type, and the treatment. It's a little unclear what you mean by plotting the position of oxidations based on treatment (what are the corresponding varying 'positions' in the figure you attached? Are you referring to upward vs downward lines?) – jbaums Feb 08 '12 at 23:27
  • 1
    It's also unclear what you mean by "sizes". Line height or line width would both be called "line sizes". – IRTFM Feb 09 '12 at 00:07
  • @Jbaums: You are right, I was thinking about a dummy dataset after I posted the question. Oxidant A and B concentrations (used in the experiment) were 10 microM, 100 microM and 1000 microM. Line size is not an appropriate word, I used previously. Line size should be replaced by line height. I would not change the width as it may interfere in data representation (for example AA503 might be oxidized by 100 microM and 504 might be oxidized by 1000 microM). Downward lines (Indirect oxidation) were no longer a major concern for me as Oxidant A and B only follow direct oxidation. – S.Chakraborty Feb 09 '12 at 20:32
  • @Dwin: Sorry about the non-specific jargon. I hope it is clearer now. – S.Chakraborty Feb 09 '12 at 20:34

1 Answers1

7

I'll do this in base graphics, though I'm sure others could do the same or better in lattice or ggplot2. I think the main thing you need to do to easily make that kind of plot with your data is reshape and rethink what format the data need to be in to be amenable to plotting. I would have done this using your data if 1) it were in long format and 2) the variables on which you base color, line type, width, etc were available as extra columns. If you had your data like that, then you could reduce it to include only the amino acids for which line segments need to be drawn. I've simulated a dataset similar to yours. You should be able to modify this code to fit your case: First the dataset:

    set.seed(1)
    # make data.frame just with info for the lines you'll actually draw
    # your data was mostly zeros, no need for those lines
    position <- sort(sample(1:4563,45,replace = FALSE))
    # but the x position needs to be shaved down!
    # modulars are the real x positions on the plot:
    xpos <- position%%600
    # line direction appeared in your example but not in your text
    posorneg <- sample(c(-1,1),45,replace = TRUE,prob=c(.05,.95))
    # oxidant concentration for line width- just rescale the oxidant concentration
    # values you have to fall between say .5 and 3, or whatever is nice and visible
    oxconc   <- (.5+runif(45))^2
    # oxidant type determines line type- you mention 2
    # just assign these types to lines types (integers in R)
    oxitype  <- sample(c(1,2),45,replace = TRUE) 
    # let's say there's another dimension you want to map color to
    # in your example png, but not in your description.
    color <- sample(c("green","black","blue"),45,replace=TRUE)

    # and finally, which level does each segment need to belong to?
    # you have 8 line levels in your example png. This works, might take
    # some staring though:
    level <- 0
    for (i in 0:7){
    level[position %in% ((i*600):(i*600+599))] <- 8-i
    }

    # now stick into data.drame:
    AminoData <-data.frame(position = position, xpos = xpos, posorneg = posorneg, 
         oxconc = oxconc, oxitype = oxitype, level = level, color = color)

OK, so imagine you can reduce your data to something this simple. Your main tool in plotting (in base) will be segments(). It is vectorized, so there's no need for looping or fanciness:

    # now we draw the base plot:
    par(mar=c(3,3,3,3))
    plot(NULL, type = "n", axes = FALSE, xlab = "", ylab = "", 
         ylim =  c(0,9), xlim = c(-10,609))
    # horizontal segments:
    segments(0,1:8,599,1:8,gray(.5))
    # some ticks: (also not pretty)
    segments(rep(c((0:5)*100,599),8), rep(1:8,each=7)-.05, rep(c((0:5)*100,599),8), 
       rep(1:8,each=7)+.05, col=gray(.5))
    # label endpoints:
    text(rep(10,8)+.2,1:8-.2,(7:0)*600,pos=2,cex=.8)
    text(rep(589,8)+.2,1:8-.2,(7:0)*600+599,pos=4,cex=.8)
    # now the amino line segments, remember segments() is vectorized
    segments(AminoData$xpos, AminoData$level, AminoData$xpos, 
       AminoData$level + .5 * AminoData$posorneg, lty = AminoData$oxitype, 
       lwd = AminoData$oxconc, col = as.character(AminoData$color))
    title("mostly you just need to reshape and prepare\nyour data to do this easily in base")

png output from plotting code here

This might be too artisanal for the tastes of some, but it's the way I go about special plotting.

tim riffe
  • 5,651
  • 1
  • 26
  • 40
  • I really appreciate your prompt reply. This is exactly what I was looking for. Thanks a bunch!! Thanks for reorganizing the data, code and detailed explanation. I will modify it as needed and get back to you if I run into some major issues (which I don't think will happen). – S.Chakraborty Feb 09 '12 at 20:35