0

It's hard to determine the relationship between these variables, so I'd like to bin them. I've found advice explaining how to bin two variables, but not seven. I'm also not sure how to tailor it to my dataset. Is there a way to alter this to bin multiple variables?

Mine (Note that the data in my photo is the transformed version of the dataset I have below):

diabetes = read.csv('https://github.com/bandcar/Examples/raw/main/diabetes.csv')
pairs(diabetes)

Advice:

set.seed(42)
x <- runif(1e4)
y <- x^2 + x + 4 * rnorm(1e4)
df <- data.frame(x=x, y=y)

library(ggplot2)
(ggplot(df, aes(x=x,y=y)) +
  geom_point(alpha = 0.4) +
  stat_summary_bin(fun.y='mean', bins=20,
                   color='orange', size=2, geom='point'))

scatter plot

bandcar
  • 649
  • 4
  • 11
  • The easiest way to avoid massive overplotting here would be to use a random subsample of the data – shs Aug 05 '22 at 20:30

1 Answers1

0

The comment from shs may actually be the right answer.
Programmatically speaking though, what about having one plot by pair of variables of interest?

library(GGally)
library(data.table)
diabetes = read.csv('https://github.com/bandcar/Examples/raw/main/diabetes.csv')
setDT(diabetes)

ggpairs(
  diabetes, 
  lower = list(
    continuous = "smooth",
    combo = "facetdensity",    
    mapping = aes(color = Outcome)
  )
)

vars <- colnames(diabetes)
vars <- vars[! vars %in% c('Age', 'Outcome')]
colpairs <- t(combn(vars, 2))

r <- 1
for (r in 1:nrow(colpairs)) {
  v1 <- colpairs[r, 1]
  v2 <- colpairs[r, 2]
  ggp <- ggpairs(
    diabetes[, .SD, .SDcols = c(v1, v2, "Outcome")], 
    lower = list(
      continuous = "smooth",
      combo = "facetdensity",
      mapping = aes(color = Outcome)
    )
  )
  print(ggp)
}

Will
  • 910
  • 7
  • 17