4

Your comments, suggestions, or solutions are/will be greatly appreciated, thank you.

I'm using the fpc package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6).

I've found some clusters, and I need to graph just the significant ones. The problem is that I have a single cluster (the first) with about 39,000 points in it. I need to graph all other clusters but this one.

The dbscan() creates a special data type to store all of this cluster data in. It's not indexed like a data frame would be (but maybe there is a way to represent it as such?).

I can graph the dbscan type using a basic plot() call. But, like I said, this will graph the irrelevant 39,000 points.

tl;dr: how do I graph only specific clusters of a dbscan data type?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
droops
  • 41
  • 1
  • 2

3 Answers3

6

If you look at the help page (?dbscan) it is organized like all others into sections labeled Description, Usage, Arguments, Details and Value. The Value section describes what the function dbscan returns. In this case it is simply a list (a standard R data type) with a few components.

The cluster component is simply an integer vector whose length it equal to the number of rows in your data that indicates which cluster each observation is a member of. So you can use this vector to subset your data to extract only those clusters you'd like and then plot just those data points.

For example, if we use the first example from the help page:

set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
    sd=0.2))
ds <- dbscan(x, 0.2)

we can then use the result, ds to plot only the points in clusters 1-3:

#Plot only clusters 1, 2 and 3
plot(x[ds$cluster %in% 1:3,])
joran
  • 169,992
  • 32
  • 429
  • 468
  • @droops - Glad to hear it! If this really solved your problem, consider clicking the check mark next to it. That will help people who find this question in the future know that this answer was useful and generally adds to the value of StackOverflow. – joran Jul 28 '11 at 23:09
1

Without knowing the specifics of dbscan, I can recommend that you look at the function smoothScatter. It it very useful for examining the main patterns in a scatterplot when you otherwise would have too many points to make sense of the data.

nullglob
  • 6,903
  • 1
  • 29
  • 31
0

The probably most sensible way of plotting DBSCAN results is using alpha shapes, with the radius set to the epsilon value. Alpha shapes are closely related to convex hulls, but they are not necessarily convex. The alpha radius controls the amount of non-convexity allowed.

This is quite closely related to the DBSCAN cluster model of density connected objects, and as such will give you a useful interpretation of the set.

As I'm not using R, I don't know about the alpha shape capabilities of R. There supposedly is a package called alphahull, from a quick check on Google.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194