1

I ran ggpairs() on my dataframe and a 'group' variable appeared in the output (image below). My dataframe has five columns and there is definitely no column in the dataframe called 'group'. Does anyone know what this 'group' variable is and where it came from?

ggpairs() plots

mischva11
  • 2,811
  • 3
  • 18
  • 34
WithRegards
  • 13
  • 1
  • 4
  • 2
    Without a small reproducible example, not clear. I tested with the example showed [here](https://www.r-graph-gallery.com/199-correlation-matrix-with-ggally.html) not able to reproduce the issue – akrun Nov 12 '20 at 23:23

1 Answers1

2

This happens in ggpairs when you pass in a grouped tibble:

library(GGally)
library(dplyr)

iris %>% 
  group_by(Species) %>%
  ggpairs()

To get rid of it, simply ungroup your data frame before passing it to ggpairs:

iris %>% 
  group_by(Species) %>%
  ungroup() %>%
  ggpairs()

The reason for this is that when you pass a grouped tibble to ggplot, it stores the groupings in its main data table as a column called .group:

p <- ggplot(iris %>% group_by(Species))
p$data
#> # A tibble: 150 x 6
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species .group
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    <int>
#>  1          5.1         3.5          1.4         0.2 setosa       1
#>  2          4.9         3            1.4         0.2 setosa       1
#>  3          4.7         3.2          1.3         0.2 setosa       1
#>  4          4.6         3.1          1.5         0.2 setosa       1
#>  5          5           3.6          1.4         0.2 setosa       1
#>  6          5.4         3.9          1.7         0.4 setosa       1
#>  7          4.6         3.4          1.4         0.3 setosa       1
#>  8          5           3.4          1.5         0.2 setosa       1
#>  9          4.4         2.9          1.4         0.2 setosa       1
#> 10          4.9         3.1          1.5         0.1 setosa       1
#> # ... with 140 more rows

This is the data ggpairs uses, hence why the .groups variable appears. This could be flagged as a bug to the authors of GGally. Note that ggplot will not add this column if it is given a normal tibble or data frame.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87