0

I have a dataset that I load with readr that has 2341 rows, 401 columns, and a few NAs peppered in. I'm trying to create scatterplot matrices with a subselection of continuous variables.

Here's a glimpse of some of the variables I use:

Observations: 2,341
Variables: 3
$ Height <dbl> 175.2000, 175.0000, 174.7000, 174.0000, 173.0000, ...
$ gender <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, ...
$ lb_pp  <dbl> 55.00000, 50.00000, 40.00000, 56.00000,  60.00000, ...

The variables Height, lb_pp, and gender have 98, 150, and 0 NAs, respectively.

Most of the variables are well-behaved and appear in the plot as normal. ggpairs throws a warning when I use the variable with 150 NAs (warning message below), but I get the plot no problem as below:

Warning message:
“Removed 150 rows containing non-finite values (stat_boxplot).”`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
“Removed 150 rows containing non-finite values (stat_bin).”Warning message:
“Removed 150 rows containing non-finite values (stat_density).”

Example image using variable with NA that produces expected output

But when I use another variable height, it chokes big time with a different warning message and a screwed up plot:

Warning message:
“Removed 98 rows containing non-finite values (stat_count).
”Warning message:
“Computation failed in `stat_count()`:
arguments imply differing number of rows: 392, 400, 1
”Warning message:
“Computation failed in `stat_count()`:
arguments imply differing number of rows: 542, 574, 1
”Warning message:
“Removed 98 rows containing non-finite values (stat_count).”Warning message:
“Computation failed in `stat_count()`:
arguments imply differing number of rows: 392, 400, 1”

Example image using variable with NA that produces unexpected output

I am able to workaround the issue by filtering out rows that have NAs for the troublesome variables, but why do I have to do this for some vars but not others? The two variables both return TRUE when I ask is.numeric. What's the difference?

Thanks for your help.

blahblahetcetc
  • 191
  • 1
  • 6

1 Answers1

0

Answering my own question here, turns out my pesky variable height that elicited the error was not an atomic vector, but rather a n x 1 matrix. Annoyingly, this is what R's scale built-in function had silently returned, in my case. A recent change in Tidyverse's glimpse function now shows a matrix's dimensions in the output rather than a sample of values, which is how I spotted the error.

blahblahetcetc
  • 191
  • 1
  • 6