7

I got some data (named result.df) which looks like the following:

    orgaName                  abundance          pVal         score        
     A                        3          9.998622e-01     1.795338e-04
     B                        2          9.999790e-01     1.823428e-05
     C                        1          2.225074e-308    3.076527e+02
     D                        1          3.510957e-01     4.545745e-01

and so on...

What I am now plotting is this:

p1 <- ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
   ylab("1 - P-Value")+
   xlab("log2(abundance)")+
   geom_point(aes(size=score))+
   ggtitle(colnames(case.count.matrix)[i])+
   geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+       
   geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
   theme_classic()

Everything works fine and looks rather fine. However, what I would like is to scale the point size introduced through

geom_point(aes(size=score))+

to be scaled against fixed values. So the legend should scale in a decadic logarithm but the score should stay the same. Such that low scores nearly disappear and large scores are kind of comparable with respect to their point size between different "result.df".

EDIT

After checking on the comments of @roman and @vrajs5 I was able to produce a plot like this new plot. Using the following code:

   ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
   ylab("1 - P-Value")+
   xlab("log2(abundance)")+
   geom_point(aes(size=score))+
   ggtitle(colnames(case.count.matrix)[i])+    
   #geom_text(data=subset(result.df, pVal < 0.05 & log2(abundance) > xInt),hjust=.65, vjust=-1.2,size=2.5)+
   geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+
   geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
   #geom_vline(aes(xintercept=xInt), colour="blue", linetype="dashed")+
   #geom_text(data=subset(result.df, pVal > 0.05 & log2(abundance) > xInt),alpha=.5,hjust=.65, vjust=-1.2,size=2)+
   #geom_text(data=subset(result.df, pVal < 0.05 & log2(abundance) < xInt),alpha=.5,hjust=.65, vjust=-1.2,size=2)+
   theme_classic() + 
   scale_size(range=c(2,12),expand=c(2,0),breaks=c(0,1,10,100,1000,1000000),labels=c(">=0",">=1",">=10",">=100",">=1000",">=1000000"),guide="legend")

As you can see, the breaks are introduced and labeled as intendet. However the point size in the legend does not reflect the point sizes in the plot. Any idea how to fix this?

Phil S.
  • 190
  • 1
  • 3
  • 11

1 Answers1

5

As @Roman mentioned, if you use scale_size you can specify the limits on size..

Following is the example how to control size of points

result.df = read.table(text = 'orgaName                  abundance          pVal         score        
A                        3          9.998622e-01     1.795338e-04
B                        2          9.999790e-01     1.823428e-05
C                        1          2.225074e-308    3.076527e+02
D                        1          3.510957e-01     4.545745e-01
E                        3          2.510957e-01     2.545745e+00
F                        3          1.510957e-02     2.006527e+02
G                        2          5.510957e-01     3.545745e-02', header = T)

library(ggplot2)
ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
  ylab("1 - P-Value")+
  xlab("log2(abundance)")+
  geom_point(aes(size=score))+
  #ggtitle(colnames(case.count.matrix)[i])+
  geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+       
  geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
  theme_classic() + 
  scale_size(range = c(2,12))

Output graph is enter image description here

vrajs5
  • 4,066
  • 1
  • 27
  • 44
  • Thanks for the fast answers, but maybe I did not state my problem clear enough. If I now want to plot another result.df (containing different values) the largest point might not indicate a score of 300 but rathern one of 3000, say. What I want is a fixed legend for the scores which makes (at least somewhat) the dot sizes comparable through different plots. Is that also possible with range? At the moment I am trying this by adding a manual scale like this: `continuous_scale(breaks=c("0","1","10","100","1000"),labels=c("0","1","10","100","1000"),scale_name="Score")` but it does not work atm. – Phil S. Mar 31 '15 at 09:07
  • 1
    Yes using scale_size range and breaks can make sure that points are plotted well. But you have to make sure you set parameters according to input set. – vrajs5 Mar 31 '15 at 09:12
  • hmm I don't quite get it yet... I edited the original question but can't fix the dot size problem in the legend... – Phil S. Mar 31 '15 at 11:08
  • That is maybe the big problem. The values of the score parameter vary from result.df to result.df . Sometimes all the scores range from 1 to 100. Sometimes they range from 1 to over 1 million. – Phil S. Mar 31 '15 at 12:12