I am trying to encircle my datapoints of a scatterplot(using ggplot2), so that (1) 100% of my datapoints and (2) 80% of my datapoints are inside that circle. (See 1 - Like in this sctech (please excuse the lazy execution with snippingtool))
Here is my dummy-dataset:
x y
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
4 1
4 2
4 3
5 1
5 2
5 3
5 4
5 9
5 10
6 1
6 2
6 3
6 4
6 5
6 6
6 8
6 9
6 10
7 1
7 2
7 3
7 4
7 5
7 6
7 7
7 8
7 9
7 10
8 2
8 3
8 4
8 5
8 6
8 7
8 8
8 9
I have tried multiple approaches to achieve this, but nothing really satisfies what I want to accomplish.
My first approach was geom_density2d()
. However, I have troubles interpreting the results, as I don't really know what the levels
mean.
I tried the following:
ggplot(myData, aes(x,y)) + geom_point() + geom_density2d(bins=4, aes(colour=..level..))
Which results in this plot 2:
It is good, as it accomplishes the dent in the contours. However, I don't know how I would get a hull that encircles 100% of my data, and a second hull that encircles 80% of my data.
My second approach was to use the geom_encircle()
function of the ggalt
package. This results in the following plot 3
This time, my whole datapoints are encircled - so far so good. But the "dent" like in the geom_contour()
plot is not present, and I don't know how to add an "encriclement" that covers only 80% of my datapoints.
My third approach was using the geom_bagplot()
function (described here).
ggplot(myData, aes(x,y)) + geom_point() + geom_bag(prop=0.9) + geom_bag(prop=0.8)
(with geom_bag() I cannot use prop=1.0 to cover all datapoints, however setting it to 0.9 is sufficient)
This yields the following plot 5:
This time, again, the dent is not present. Another problem is, that setting prop=0.7
and prop=0.7
yields the exact same outcome. Another problem is, that the hull is not smooth like geom_contour()
.
How can I produce a plot (with ggplot2) that looks like my sketch in 1?
Thanks in advance!
____________________________________________
EDIT:
The actual dataset to show the real distribution of my datapoints:
x y
1 -19.397412 47.544324
2 -8.213419 69.892953
3 -29.926849 39.743923
4 -75.377447 79.817208
5 -9.215048 40.705533
6 -42.868995 45.721222
7 -85.590572 84.058463
8 -62.544121 69.371364
9 -60.209205 64.546267
10 3.598963 20.109707
11 -4.552074 61.3339
12 -197.619021 52.225312
13 -147.133639 56.96088
14 -59.402414 56.487012
15 -68.361091 46.811878
16 -105.556485 57.603839
17 -94.354948 32.706933
18 -107.26281 28.477637
19 -155.692967 35.106937
20 -80.819257 30.664812
21 -142.055086 33.728788
22 -118.353934 27.362929
23 -114.634413 31.501665
24 -113.470642 29.136781
25 -181.380891 41.046883
26 -171.106218 23.359443
27 -156.720415 35.450407
28 -165.042839 29.349575
29 -92.869955 25.478965
30 -114.78719 23.860353
31 -134.115204 25.491367
32 -109.430656 19.105614
33 -120.451655 25.97992
34 -87.570713 21.111895
35 -91.222139 22.484895
36 -208.979695 38.311266
37 -98.814223 16.121487
38 -201.812263 49.547512
39 -168.948464 39.583593
40 -112.44335 20.979357
41 -174.138029 28.470047
42 -220.936718 33.452972
43 -169.687859 33.173458
44 -157.119306 38.573987
45 -150.682075 41.66627
46 -77.397116 27.220171
47 -177.559527 53.278523
48 -61.212396 6.796908
49 -94.602774 24.669706
50 -204.333869 37.002679
51 -124.442364 31.519392
52 -165.722504 39.464188
53 -57.849212 23.973774
54 -106.643382 38.560785
55 -90.679094 29.863184
56 -132.476054 31.988021
57 -188.33621 29.658416
58 -136.247184 38.870171
59 -59.929772 20.626164
60 -121.020003 33.862312
61 -82.968422 33.033312
62 -79.130004 32.800121
63 -51.463395 23.452366
64 -63.819269 27.257994
65 -64.02259 27.711516
66 -66.876407 18.156063
67 -68.175454 22.996369
68 -108.640035 29.915306
69 -21.512647 16.930815
70 -66.902542 17.177093
71 -160.262625 33.061052
72 -41.672641 30.510433
73 -83.31784 28.965415
74 -132.410284 22.843924
75 -54.724716 10.642682
76 -69.688094 30.798878
77 -120.775133 24.597096
78 -78.655551 30.368373
79 -68.299767 35.937048
80 -45.037891 21.636422
81 -49.679704 19.508719
82 -62.018393 76.199247
83 -113.777141 27.730892
84 -74.630501 49.062317
85 -95.154793 37.279829
86 -65.229569 46.26744
87 -42.139223 16.38709
88 -94.186408 28.708069
89 -100.920471 27.533579
90 -66.332707 22.573064
91 -26.419725 13.948061
92 -152.704377 34.165409
93 -50.309209 22.032052
94 -125.896489 34.411915
95 -119.304969 28.786249
96 -41.689412 37.314049
97 -99.936438 31.363461
98 -74.807901 24.259652
This yields the following plot 6:
And I would like to show that most of my Datapoints are in the lower part, but still encircle all the data, something like in 7:
____________________________________________
EDIT2:
The "ultimate goal" would be to compare those both contours, without the corresponding datapoints, to another dataset, to see whether there are overlaps, but without overcrowding the resulting plots with too many datapoints.