1

I am working on performing a HDBSCAN, and am performing the analysis using the hdbscan python module within R. I have the following code:

library(reticulate)
hdb <- import("hdbscan") # Import hdbscan Python library
# Create dummy data. My actual data set is an already cleaned data frame.
dat <- data.frame(id=1:100, a = rbinom(100, 1, .4), b = rbinom(100, 1, .8), c = rbinom(100, 1, .6), d = rbinom(100, 1, .2))
datMat <- as.matrix(dat) # Convert to matrix so it correctly converts to a 2d array

clusterer = hdb$HDBSCAN(metric='jaccard') # Start clusterer with Jaccard distance metric
clusterer$fit(datMat) # Fit the data

Next, I want to have a look at the condensed tree plot. The Python code for this would be:

clusterer.condensed_tree_.plot()

Translated to R:

clusterer$condensed_tree_$plot()

The output of this command is:

AxesSubplot(0.125,0.11;0.31744x0.77)

I can put all of this in a RMarkdown file, using raw python, and I will get the plot I want. However, this only works when knitting the entire file, which can take some time. Especially as I'm currently going through different parameter settings of the HDBSCAN, it would be great if there is a way to plot the condensed tree (and other diagnostic plots) without having to knit an entire RMarkdown file.

Anybody know if/how I can plot a Python-generated plot in R without using RMarkdown?

Example of RMarkdown file:

\```{r}
dat <- data.frame(id=1:100, a = rbinom(100, 1, .4), b = rbinom(100, 1, .8), c = rbinom(100, 1, .6), d = rbinom(100, 1, .2))
datMat <- as.matrix(dat)
\```

\```{python}
import hdbscan

clusterer = hdbscan.HDBSCAN(metric='jaccard') # Start clusterer with Jaccard distance metric
clusterer.fit(r.datMat) # Fit the data
clusterer.condensed_tree_.plot()
\```
kneijenhuijs
  • 1,189
  • 1
  • 12
  • 21

0 Answers0