0

In my dataset

comp=structure(list(MYCT = c(125L, 29L, 29L, 29L, 29L, 26L, 23L, 23L, 
23L, 23L, 400L, 400L), MMIN = c(256L, 8000L, 8000L, 8000L, 8000L, 
8000L, 16000L, 16000L, 16000L, 32000L, 1000L, 512L), MMAX = c(6000L, 
32000L, 32000L, 32000L, 16000L, 32000L, 32000L, 32000L, 64000L, 
64000L, 3000L, 3500L), CACH = c(256L, 32L, 32L, 32L, 32L, 64L, 
64L, 64L, 64L, 128L, 0L, 4L), CHMIN = c(16L, 8L, 8L, 8L, 8L, 
8L, 16L, 16L, 16L, 32L, 1L, 1L), CHMAX = c(128L, 32L, 32L, 32L, 
16L, 32L, 32L, 32L, 32L, 64L, 2L, 6L), PRP = c(198L, 269L, 220L, 
172L, 132L, 318L, 367L, 489L, 636L, 1144L, 38L, 40L), ERP = c(199L, 
253L, 253L, 253L, 132L, 290L, 381L, 381L, 749L, 1238L, 23L, 24L
)), .Names = c("MYCT", "MMIN", "MMAX", "CACH", "CHMIN", "CHMAX", 
"PRP", "ERP"), class = "data.frame", row.names = c(NA, -12L))

I have 8 variables. I need get boxplot , where outliers are indicated as red circle and there is scale with percentiles. Now i simple write

boxplot(comp$MMIN)

but this plot without outliers. I expect something like this expected plot

For example in this picture i see two outliers, above 75 percentile. And this plots i need for each 8 vars. How to perform it?

psysky
  • 3,037
  • 5
  • 28
  • 64
  • If it plots without outliers maybe there are no outliers in `comp$MMIN`. As for the boxplot with 8 variables, try `boxplot(comp)`. – Rui Barradas May 28 '18 at 12:49
  • @RuiBarradas, i know this trick, but how can i visualize the outliers as i want with percentile scale? – psysky May 28 '18 at 13:06
  • your drawing is wrong. 50% is the median (big black line). 75% is the top edge of the box. – Andre Elrico May 28 '18 at 13:16
  • @AndreElrico, yes it is. I just want show what i need, this scale was performed in paint, of course, it's wrong view – psysky May 28 '18 at 13:17
  • Why do you think there are outliers? See https://www.rdocumentation.org/packages/grDevices/versions/3.5.0/topics/boxplot.stats for how the box, whiskers, and outliers are calculated. There are hence no outliers for your vector `MMIN`. – Weihuang Wong May 28 '18 at 13:24
  • use: `ggplot2` `geom_boxplot`. [little example](http://www.r-graph-gallery.com/263-ggplot2-boxplot-parameters/). The red outliers are no problem. But your scale will be a little fiddly. You will need to draw the lines, the ticks and annotate the numbers by hand. Also you will need to calculate the y positions beforehand. – Andre Elrico May 28 '18 at 13:25
  • little tip. I believe the boxplots are positioned at x = 1. So x=0.5 is a good x for you vertical percentile line. (trial and error on these things of course) – Andre Elrico May 28 '18 at 13:37
  • @AndreElrico thank you for example, but ggplot(mpg, aes(x=class, y=hwy)) how can i do without class on x. Only metric vars in this mpg exampe – psysky May 28 '18 at 13:37

2 Answers2

2

Here is a possible solution using base graphics. The key is to suppress the y axis and then add the tick marks based on the summary statistics.

#build the box plot and surpress the y axis lables 
b<-boxplot(comp$MMIN, yaxt="n", range=1.1)
points(x=rep(1, nrow(comp)), y=comp$MMIN)
#highlight outliers
points(x=rep(1, length(b$out)), y=b$out, col="red", pch=19)

#get the points for the y axis
myscale<-summary(comp$MMIN)
#remove the median
myscale<-myscale[-3]
#add the y-axis
axis(2, b$stats, labels=c(0, 25, 50, 75, 100))

#use this option for labels on both the right and left side
b<-boxplot(comp$MMIN, outline = FALSE)
axis(4, b$stats, labels=c(0, 25, 50, 75, 100))

enter image description here

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Andre, maybe this solution have a sence. I think i asked really difficult question. is it possible, the outlier points in the graph to delete from dataset? What Dave2e does think about this possibility? – psysky May 28 '18 at 13:54
  • 1
    @varimax This does what you want, but how can you do this for 8 columns? The quantiles will be different from column to column. Unless you use `par(mfrow = c(2, 4))` and plot one at the time. – Rui Barradas May 28 '18 at 13:58
  • I need plot with outlier to delete it – psysky May 28 '18 at 14:01
  • I did my best to interpert your desired solution based on the posted data and example. If this is not correct, please clarify the question. What is your definition of an outlier. – Dave2e May 28 '18 at 14:02
  • Dave2e, it is simple. now we see 5 outliers circle(right?). How i can delete it from my dataset? – psysky May 28 '18 at 14:14
  • @Dave2e not sure either what you want. Good luck. – Andre Elrico May 28 '18 at 14:18
2

by no means a ready solution but this should get you on your way.

off=0.55
ggplot() + 
    geom_boxplot(data=comp,
        aes(x="",y=MMIN),
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3
    ) +
    geom_line(aes(x=c(off,off),y=c(5000,20000))) +
    geom_text(aes(x=c(off,off),y=c(5000,20000),label=c("needs to", "be calculated")))
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69