1

I want to make a scatterplot matrix with points in upper pane and r or r2 values in lower pane, as described here: http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs

When there is no missing data, it works fine. But when there are some missing values, it seems unable to calculate R, even when I use code I thought would account for missing values. See commented-out lines in the code below, which show what I've tried -- those attempts were passed on what I found after searching about here on StackOverflow: Dealing with missing values for correlations calculation

Probably something simple, as I'm a pretty simple R user (so I'm hoping for solutions that are more simple than elegant). Talk to me like I'm stupid!

I do not want to remove whole rows just because there is one missing value, as my real dataset (not this example) is rather small.

# --------------------------------------
# Create Dataframes, one with missing values
# --------------------------------------
Alx <- c(13, 9, 5, 17, 2, 8, 11, 4)
Bex <- c(23, 41, 32, 58, 26, 33, 51, 46)
Dex <- c(7,10,6,4,19,6,15,16)
Gax <- c(43,54,31,28,60,30,43,21)

AlxM <- c(NA, 9, 5, 17, 2, 8, 11, 4)
BexM <- c(23, 41, NA, 58, 26, 33, 51, 46)
DexM <- c(7,10,6,4,19,6,15,NA)
GaxM <- c(43,54,31,28,60,30,43,21)

df <- data.frame(Alx,Bex,Dex,Gax) # dataframe that works in scatterplot matrix
df_miss <- data.frame(AlxM,BexM,DexM,GaxM)# dataframe that has missing values

rm(Alx,Bex,Dex,Gax,AlxM,BexM,DexM,GaxM) # removing un-needed garbage
# --------------------------------------

# --------------------------------------
# Scatterplot Matrix - functions for upper and lower 
# panels, it is the line "r <- round(cor(x,y), digits=2)"
# that I've been focusing on. Perhaps the wrong approach?
# see: http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs
# --------------------------------------
# Upper panel
upper.panel<-function(x, y){
  points(x,y, pch=19)
  r <- round(cor(x,y), digits=2)
  txt <- paste0("R = ", r)
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  text(0.5, 0.9, txt)
}

# Correlation panel
panel.cor <- function(x, y){
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- round(cor(x, y), digits=2)   # gives all NA
  # Neither of these (immediately below) worked for me:
  # see: https://stackoverflow.com/questions/7445639/dealing-with-missing-values-for-correlations-calculation
  # r <- round(cor(na.omit(x, y)), digits=2) # does not work
  # r <- round(cor(x, y), use="pairwise.complete.obs", digits=2) # does not work
  txt <- paste0("R = ", r)
  cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = 0.5)
}

# Scatterplots
pairs(df[,1:4], lower.panel = panel.cor, 
      upper.panel = upper.panel)

pairs(df_miss[,1:4], lower.panel = panel.cor, 
      upper.panel = upper.panel)
# --------------------------------------
Steve T
  • 73
  • 1
  • 11
  • Your statement `r <- round(cor(x, y), use="pairwise.complete.obs", digits=2)` is close except that `use` should be an argument to `cor` not `round`. Try `r <- round(cor(x, y, use="pairwise.complete.obs"), digits=2)` – G5W Jul 24 '22 at 17:34

1 Answers1

1

We can use the use argument in cor i.e. it shouldn't be outside the cor as in the OP's commented line r <- round(cor(x, y), use="pairwise.complete.obs", digits=2)

panel.cor <- function(x, y){
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- round(cor(x, y, use = "pairwise.complete.obs"), digits=2)  
  txt <- paste0("R = ", r)
  cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = 0.5)
}

-testing

pairs(df_miss[,1:4], lower.panel = panel.cor, 
       upper.panel = upper.panel)

-output

enter image description here

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Wouldn‘t you need `pairwise.complete.obs`? Because otherwise one missing value deletes the whole row, which is not what the TO wants, I think? – deschen Jul 24 '22 at 17:32
  • @deschen is correct, I don't want to delete the whole row - I had tried this before but somehow it wasn't working. Now it is working with pairwise.complete.obs – Steve T Jul 24 '22 at 17:35