Problem definition
I need to produce a number of specific graphs, and on these graphs, highlight subsets of vertices (nodes) by drawing a contour/polygon/range around or over them (see image below).
- A graph may have multiple of these contours/ranges, and they may overlap, iff one or more vertices belong to multiple subsets.
- Given a graph of N vertices, any subset may be of size 1..N.
- However, vertices not belonging to a subset must not be inside the contour (as that would be misleading, so that's priority no. 1). This is gist of my problem.
- All these graphs happen to have the property that the ranges are continuous, as the data they represent covers only directly connected subsets of vertices.
- All graphs will be undirected and connected (no unconnected vertices will ever be plotted).
Reproducible attempts
I am using R and the igraph
package. I have already tried some solutions, but none of them work well enough.
First attempt, mark.groups
in plot.igraph
:
library(igraph)
g = make_graph("Frucht")
l = layout.reingold.tilford(g,1)
plot(g, layout=l, mark.groups = c(1,3,6,12,5), mark.shape=1)
# bad, vertex 11 should not be inside the contour
plot(g, layout=l, mark.groups = c(1,6,12,5,11), mark.shape=1)
# 3 should not be in; image below
# just choosing another layout here is not a generalizable solution
The plot.igraph calls igraph.polygon, which calls convex_hull (also igraph), which calls xspline. The results is, from what I understand, something called a convex hull (which otherwise looks very nice!), but for my purposes that is not precise enough, covering vertices that should not be covered.
Second attempt with contour
. So I tried implementing my own version, based on the solution suggested here:
library(MASS)
xx <- runif(5, 0, 1);yy <- abs(xx)+rnorm(5,0,0.2)
plot(xx,yy, xlim=c( min(xx)-sd(xx),max(xx)+sd(xx)), ylim =c( min(yy)-sd(yy), max(yy)+sd(yy)))
dens2 <- kde2d(xx, yy, lims=c(min(xx)-sd(xx), max(xx)+sd(xx), min(yy)- sd(yy), max(yy)+sd(yy) ),h=c(bandwidth.nrd(xx)/1.5, bandwidth.nrd(xx)/ 1.5), n=50 )
contour(dens2, level=0.001, col="red", add=TRUE, drawlabels=F)
The contour plot looks in principle like something I could use, given enough tweaking of the bandwidth and level values (to make the contour snug enough so it doesn't cover any points outside the group). However, this solution has the drawback that when the level value is too small, the contour breaks (doesn't produce a continuous area) - so if I would go that way, controlling for continuity (and determining good bandwidth/level values on the fly) automatically should be implemented. Another problem is, I cannot quite see how could I plot the contour over the plots produced by igraph
: the layout.*
commands produce what looks like a coordinate matrix, but the coordinates do not match the axis coordinates on the plot:
# compare:
layout.reingold.tilford(g,1)
plot(g, layout=l, axes=T)
The question:
What would be a better way to achieve the plotting of such ranges on graphs (ideally igraphs) in R that would meet the criteria outlined above - ranges that include only the vertices that belong to their subset and exclude all else - while being continous ranges?
The solution I am looking for should be scalable to graphs of different sizes and layouts that I may need to create (so hand-tweaking each graph by hand using e.g. tkplot is not a good solution). I am aware that on some graphs with some vertex groups, meeting both the criteria will indeed be impossible in practise, but intuitively it should be possible to implement something that still works most of the time with smallish (10..20 vertices) and not-too-complex graphs (ideally it would be possible to detect and give a warning if a perfectly fitting range could not be plotted). Either an improvement of the mark.groups
approach (not necessarily within the package, but using the hull-idea mentioned above), or something with contour
or a similar suitable function, or suggesting something else entirely would be welcome, as long as it works (most of the time).
Update stemming from the discussion: a solution that only utilizes functions of core R or CRAN packages (not external software) is desirable, since I will eventually want to incorporate this functionality in a package.
Edit: specified the last paragraph as per the comments.