how to bin multiple variables for scatterplot

Question

It's hard to determine the relationship between these variables, so I'd like to bin them. I've found advice explaining how to bin two variables, but not seven. I'm also not sure how to tailor it to my dataset. Is there a way to alter this to bin multiple variables?

Mine (Note that the data in my photo is the transformed version of the dataset I have below):

diabetes = read.csv('https://github.com/bandcar/Examples/raw/main/diabetes.csv')
pairs(diabetes)

Advice:

set.seed(42)
x <- runif(1e4)
y <- x^2 + x + 4 * rnorm(1e4)
df <- data.frame(x=x, y=y)

library(ggplot2)
(ggplot(df, aes(x=x,y=y)) +
  geom_point(alpha = 0.4) +
  stat_summary_bin(fun.y='mean', bins=20,
                   color='orange', size=2, geom='point'))

The easiest way to avoid massive overplotting here would be to use a random subsample of the data — shs, Aug 05 '22 at 20:30

Will · Answer 1 · 2022-08-05T23:50:05.607

The comment from shs may actually be the right answer.
Programmatically speaking though, what about having one plot by pair of variables of interest?

library(GGally)
library(data.table)
diabetes = read.csv('https://github.com/bandcar/Examples/raw/main/diabetes.csv')
setDT(diabetes)

ggpairs(
  diabetes, 
  lower = list(
    continuous = "smooth",
    combo = "facetdensity",    
    mapping = aes(color = Outcome)
  )
)

vars <- colnames(diabetes)
vars <- vars[! vars %in% c('Age', 'Outcome')]
colpairs <- t(combn(vars, 2))

r <- 1
for (r in 1:nrow(colpairs)) {
  v1 <- colpairs[r, 1]
  v2 <- colpairs[r, 2]
  ggp <- ggpairs(
    diabetes[, .SD, .SDcols = c(v1, v2, "Outcome")], 
    lower = list(
      continuous = "smooth",
      combo = "facetdensity",
      mapping = aes(color = Outcome)
    )
  )
  print(ggp)
}

how to bin multiple variables for scatterplot

1 Answers1