0

I'm trying to some simple box plots, but have noted the points I've got in my dataframe are just plotting incorrectly in ggplot, inside all of the aforementioned types of plot.

My data is

structure(list(rownum = 1:74, Device = c("Dexcom", "Dexcom", 
"Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", 
"Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Libreview", 
"Libreview", "Libreview", "Libreview", "Libreview", "Libreview", 
"Libreview", "Libreview", "Libreview", "Libreview", "Libreview", 
"Libreview", "Libreview", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend CGM", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend Manual", 
"Diasend CGM", "Diasend CGM", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend CGM", "Diasend Manual", 
"Diasend Manual", "Diasend CGM", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend CGM", 
"Diasend Manual", "Diasend CGM", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend CGM", "Diasend Manual", "Diasend CGM", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend CGM", "Diasend Manual"
), PREMean = c(10.0484850182022, 7.84715557883709, 7.28766699205132, 
8.47011442894507, 10.7497970736388, 8.6565711351755, 12.2666572965045, 
12.8489327534292, 9.38152123552124, 9.82593283758822, 9.25191807020791, 
10.590004260355, 10.1991015796402, 8.11500023112837, 9.3887371146612, 
9.05289979902383, 16.3938994229184, 11.2269812823576, 8.46589333710567, 
9.45301483336544, 9.654521175124, 9.17169712793734, 5.90663637838715, 
15.1026720647773, 8.73502786461873, 12.515518913676, 10.2021609195402, 
8.88323924469535, 9.138, 10.5977853492334, 14.7827906976744, 
10.9643874643875, 8.04525252525253, 9.2234693877551, 9.2234693877551, 
13.4109826589595, 8.65916169339799, 9.07101449275362, 10.7026923076923, 
17.9097799511002, 6.05655339805825, 7.24913151364764, 7.84826142795985, 
11.6334796926454, 10.0795389048991, 9.63545878693624, 11.7388888888889, 
11.3917218543046, 8.11740335319385, 9.41461318051576, 12.9295681063123, 
10.2035994083164, 7.68975155279503, 10.249885583524, 5.79714285714286, 
10.0638826185102, 8.44704049844237, 10.6952513150205, 9.36492957746479, 
9.83008799318762, 9.6688654353562, 8.00041753653445, 9.26, 9.38389756944444, 
8.55568181818182, 8.63457241816674, 8.12372881355932, 9.84208494208494, 
11.28828125, 9.04013157894737, 11.6740659340659, 9.61797752808989, 
13.8315843798383, 10.1719101123596), POSTMean = c(8.19190208049315, 
7.61158509359437, 7.20120148352596, 8.57923580164976, 10.6268789167925, 
8.37193152150653, 12.3593220150292, 13.9380512091038, 9.30225121492054, 
8.19597861420017, 8.73307014253563, 8.23531795760565, 10.4691064145347, 
8.78835006435006, 9.48096681373489, 9.12521085925145, 13.1253985706432, 
10.2115876974231, 7.65094314018184, 11.1021567021567, 12.3527429320352, 
8.74159058145123, 6.82408707865169, 9.2207729468599, 8.33679846938776, 
11.2045885361817, 12.2492643845594, 8.41001977587343, 8.24191419141914, 
10.7707317073171, 12.2390334572491, 8.28022598870056, 7.67814207650273, 
9.48614130434783, 9.48614130434783, 11.0455128205128, 8.36162310181728, 
10.2825581395349, 10.1807407407407, 16.3283333333333, 7.56851851851852, 
6.80612244897959, 7.6510029661656, 12.1434984833165, 12.2157894736842, 
11.2797101449275, 19.1619047619048, 13.2472361809045, 8.87069342340552, 
8.40763888888889, 13.5286956521739, 10.4632632632633, 8.76877470355731, 
10.6271903323263, 8.2667701863354, 8.61640378548896, 6.96209386281588, 
8.29738799201886, 8.51794871794872, 8.10574666733237, 8.43217993079585, 
7.7244635193133, 13.9224137931034, 9.19426699426699, 8.15335753176044, 
8.30695218383485, 5.89611231101512, 9.45526315789474, 9.406875, 
9.78860759493671, 9.33200934579439, 9.406875, 11.2342145015106, 
11.2984126984127)), row.names = c(NA, -74L), na.action = structure(c(`19` = 19L, 
`30` = 30L, `38` = 38L, `39` = 39L, `42` = 42L, `44` = 44L, `51` = 51L, 
`62` = 62L, `79` = 79L, `84` = 84L), class = "omit"), class = c("tbl_df", 
"tbl", "data.frame"))

Then

ggplot(data, aes(x=PREMean, y=POSTMean)) + geom_point()

Plots some points that are clearly too low - less than 5. None of the numbers are less than 5.

Plotting with ggboxplot and ggpaired also gives me points that are far too low.

I'm tearing my hair out, I just don't understand why the points are clearly plotting incorrectly? Please help, thanks.

Richard Telford
  • 9,558
  • 6
  • 38
  • 51

1 Answers1

1

As @RichardTelford states your plot is as expected.

I've added both plots to the answer to demonstrate the difference between ggplot's default axes scales and user defined scales.

ggplot does not know how you will interpret the axis: it just takes the minimum and maximum values for each axis and fits them to the space available and does the best job it can with labelling the tick marks. ggplot relies on the reader to workout, in the case of the default version using your data that the minor grid lines on the x axis represent 2.5, therefore the x origin is somewhat greater than 5.

If you want to be explicit about the axes values and breaks you will have to tell ggplot what to print. You have lots of flexibility: you can set limits, breaks and scale...

If you want a particular pair of limits and breaks for a series of graphs then you may be better off creating a function which does this for you; that's the topic of another question though; you could look at this answer, which sets the scales from 0 to the limits of the data: Setting y axis breaks in ggplot


library(ggplot2)
library(patchwork)

p1 <- ggplot(data, aes(x=PREMean, y=POSTMean)) +
  geom_point()+
  ggtitle("Default axis scales")


p2 <- ggplot(data, aes(x=PREMean, y=POSTMean)) +
  geom_point()+
  scale_x_continuous(limits = c(0,20))+
  scale_y_continuous(limits = c(0,20))+
  ggtitle("Defined axis scales")


p1/p2

Created on 2020-06-27 by the reprex package (v0.3.0)

Peter
  • 11,500
  • 5
  • 21
  • 31
  • This is weird - I'm getting the points to plot from the summarise function - is that causing me an issue. It definitely doesn't plot as expected in my original data frame! Has anyone had this problem with data collected by summarise() ? – Neil Lawrence Jun 26 '20 at 22:38
  • Thanks for the help everyone, I really appreciate it, when I put in the scales to my plot the data behaves. But why on earth is that necessary? I don't want to specify the scales with a plot that I want to be flexible with certain entries. Not ideal, and really worrying for plots that I produce in r. Why has ggplot changed the position of the points? – Neil Lawrence Jun 26 '20 at 22:44
  • Thank you so much for explaining that, I feel like a bit of an idiot to say the least! You explanation was incredibly helpful, thanks, I really appreciate it. – Neil Lawrence Jun 28 '20 at 10:21