2

I am learning biplot with wine data set. How does R know Barolo, Grignolino and Barbera are wine.class while we don't see the wine class column in the data set?

More details about the wine data set are in the following links

ggbiplot - how not to use the feature vectors in the plot

https://github.com/vqv/ggbiplot

Thanks very much

Community
  • 1
  • 1
Little Bee
  • 1,175
  • 2
  • 13
  • 22
  • You can see in the Environment that `wine.class` is a vector which is loaded when you call `data(wine)` – HubertL Mar 02 '16 at 21:00
  • Thanks for answering. Yes, `wine.class` does show as a vector. Would you elaborate more? I exported the data set to a csv file but no information of wine.class was visible – Little Bee Mar 02 '16 at 21:41

1 Answers1

1

In the wine dataset, you have 2 objects, one data.frame wine with 178 observations of 13 quantitative variables:

str(wine)
'data.frame':   178 obs. of  13 variables:
 $ Alcohol       : num  14.2 13.2 13.2 14.4 13.2 ...
 $ MalicAcid     : num  1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
 $ Ash           : num  2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
 $ AlcAsh        : num  15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
 $ Mg            : int  127 100 101 113 118 112 96 121 97 98 ...
 $ Phenols       : num  2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
 $ Flav          : num  3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
 $ NonFlavPhenols: num  0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
 $ Proa          : num  2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
 $ Color         : num  5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
 $ Hue           : num  1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
 $ OD            : num  3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
 $ Proline       : int  1065 1050 1185 1480 735 1450 1290 1295 1045 1045 ...

There is also one vector wine.class that contains 178 observations of the qualitative wine.class variable:

str(wine.class)
 Factor w/ 3 levels "barolo","grignolino",..: 1 1 1 1 1 1 1 1 1 1 ...

The 13 quantitative variables are used to compute the PCA:

wine.pca <- prcomp(wine, scale. = TRUE)

while the wine.class variable is just used to color the points on the plot

HubertL
  • 19,246
  • 3
  • 32
  • 51
  • Thank you for your patience. So `wine` and `wine.class` have to be separated in order for the `pca` analysis to run. So in this following code, how does R know where to look for `wine.class` to put them together in the graph. `ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class, ellipse = TRUE, circle = TRUE) + scale_color_discrete(name = '') + theme(legend.direction = 'horizontal', legend.position = 'top')` When I load the data, I had to load `wine` only, not `wine.class`. – Little Bee Mar 02 '16 at 23:11
  • 1
    `wine.class` is loaded as well as `wine` `data.frame` when you do `data(wine)`. and it's `groups = wine.class` that makes the colors and elipses (try to remove it) – HubertL Mar 02 '16 at 23:48
  • Oh I got it. So they are linked but not necessarily displayed in one sheet. Thanks very much – Little Bee Mar 03 '16 at 00:20