Correlation coefficient between nominal and cardinal scale variables

Question

I have to describe the correlation between a variable "Average passes completed per game" (cardinal scale) and a variable "Position" (nominal scale) and measure the strength of the correlation. For that I have to choose the correlation coefficient correctly considering the Scales. Does anyone know what the best way to do that would be? I am not sure what to use since it is two different scales. The full dataset consists of the following variables:

PLAYER: Name of the player
COUNTRY: Country of origin
BIRTHDATE: Birthday Date
HEIGHT_IN_CM: Height of the player
POSITION: Position of the player
PASSES_COMPLETED: Passes completed by the player
DISTANCE_COVERED: Distance covered by the player in km
MINUTES_PLAYED: Minutes played
AVG_PASSES_COMPLETED: Average passes completed by the player

I would very much appreciate if someone could give me some advice on this.

Thank you!

This should be posted on Cross Validated; Stack Overflow is for *coding*-specific questions. There are some interesting posts on CV that should get you started: [Correlations between continuous and categorical (nominal) variables](https://stats.stackexchange.com/questions/102778/correlations-between-continuous-and-categorical-nominal-variables), [Correlation coefficient for non-dichotomous nominal variable and ordinal or numeric variable](https://stats.stackexchange.com/questions/73065/correlation-coefficient-for-non-dichotomous-nominal-variable-and-ordinal-or-nume). — Maurits Evers, Jan 15 '20 at 02:01

score 0 · Answer 1 · answered Jan 15 '20 at 13:01

OK, so you need to redefine your question somewhat. Without two continuous variables correlations cannot be used to "describe" a relationship as I guess you are asking. You can, however, see if there are statistically significant differences in pass rates between different positions. As for the questions on the statistics, I agree with Maurtis...CV is best place. As for the code to do the tests, try this:

Firstly you need to make sure you have the right packages installed. You will definitely need ggplot and ggfortify, and maybe others if you have to manipulate data, or other things. And load the libraries:

library(ggplot2)
library(ggfortify)

Next, make sure that your data is tidy: ie, variables in columns.

Then import your data into R:

#find file
data.location = file.choose()
#Import data
curr.data <- read.csv(data.location)
#Check data import
glimpse(curr.data)

Then plot using ggplot:

ggplot(curr.data, aes(x = POSITION, y = AVG_PASSES_COMPLETED)) +
  geom_boxplot() +
  theme_bw()

Then model using the linear model function (lm()) to see if there is a significant difference in pass rates with regards to position.

passrate_model <- lm(AVG_PASSES_COMPLETED ~ POSITION, data = curr.data)

Before you test your hypothesis, you need to check the appropriateness of the model

autoplot(passrate_model, smooth.colour = NA)

If the residual plots look fine, then we are ready to test. If not then you will have to use another type of model (and I'm not going into that here now....).

The appropriate test for this (I think) would be a Tukey test, which requires an ANOVA. This will give a summary, and should show you if there is variance due to position:

passrate_av <- aov(passrate_model)
summary(passrate_av)

This will perform the Tukey test and give pair-wise comparisons including difference in means, 95% confidence intervals, and adjusted p-values:

tukey.test <- TukeyHSD(passrate_av)
tukey.test

And it can even do a nice plot for you too:

plot(tukey.test)

You should probably read up on how to programme in R. It's quite easy for standard analysis, which this really is. This is a good book: https://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780198787839.001.0001/acprof-9780198787839 — Elk, Jan 15 '20 at 13:03
Thank you for your reply! I am actually doing this in R but we were told not to use certain methods for this. For example, I found out the funktion eta() https://www.rdocumentation.org/packages/ryouready/versions/0.4/topics/eta But I was told not to use it in this case... — RubP, Jan 16 '20 at 14:08
This code is for R. You really should read the textbook I linked in the comment above. — Elk, Jan 17 '20 at 18:43
If this answer has helped you please mark it as answered to close off, and upvote . Thanks — Elk, May 14 '21 at 16:19

Correlation coefficient between nominal and cardinal scale variables

1 Answers1