2

I am using ggplot to create a boxplot. The code is the following:

ggplot(my_data, aes(x = as.factor(viotiko), y = pd_1year, fill = as.factor(viotiko))) + geom_boxplot() +
  labs(title="Does the PD differ significantly by 'Viotiko' group?",x="Viotiko Group", y = "PD (pd_1year)") 

This outputs the following graph:

Boxplot without limits in the y-axis

Next, I wanted to focus in a range of the y values --[0, 0.05] -- and I run again the code with the parameters changed. I did not mean to exclude data and alter the mean and the distribution but simply to focus on a particular range of y values. The code was again this:

ggplot(my_data, aes(x = as.factor(viotiko), y = pd_1year, fill = as.factor(viotiko))) + geom_boxplot() +
  labs(title="Does the PD differ significantly by 'Viotiko' group?",x="Viotiko Group", y = "PD (pd_1year)") +
  scale_y_continuous(breaks =seq(0, .05, .01), limit = c(0, 0.05))

This returned a Warning "Removed 173664 rows containing non-finite values (stat_boxplot)." and outputted the following graph:

Boxplot after setting a limit to the y axis

Apparently, ggplot somehow alters the input data on which the boxplot is based. However, my intention is simply to focus in segment of the box plot so that I can examine closer the differences between the groups. How can I do this using ggplot?

Your advice will be appreciated.

ak7
  • 175
  • 1
  • 4
  • 8

1 Answers1

6

Instead of your scale_y_continuous() code, use coord_cartesian() as follows.

This

scale_y_continuous(breaks =seq(0, .05, .01), limit = c(0, 0.05))

replace with this

coord_cartesian(ylim = c(0,0.05))

Also noticed that you are trying to present mean. Note that boxplot show median not mean. Maybe something that you should keep in mind for your data presentation. Also BoxPlots are usually preferred over other option because it shows data distribution (for example outliers) and other important statistics for comparison. Cropping only to show median, hence may not be a good idea, rather you could just only show median using geom_point().

M_Shimal
  • 413
  • 3
  • 12
  • Thank you. Is there a way to show mean in the boxplot? – ak7 Mar 24 '17 at 16:53
  • @ak7 You can't show mean exactly with boxplot. But you can always plot mean as a point over the box plot show, mean, median and IQRs. To do that, place `+ geom_point()` with your mean as `y` aesthetics within it. – M_Shimal Mar 24 '17 at 17:08