-4

I need to build a dependency matrix with all the 91 variables of my data-set.

I tried to use some codes, but I didn't succeed.

Here you are part of the important codes:

p<- length(dati)
chisquare <- matrix(dati, nrow=(p-1), ncol=p)

It should create a squared-matrix with all the variables

system.time({for(i in 1:p){
    for(j in 1:p){
        a <- dati[, rn[i+1]]
        b <- dati[, cn[j]]
        chisquare[i, (1:(p-1))] <- chisq.test(dati[,i], dati[, i+1])$statistic
        chisquare[i, p] <- chisq.test(dati[,i], dati, i+1])$p.value
    }}
})

It should relate the "p" variables to analyze whether they are dependent to each other

Error in `[.data.frame`(dati, , rn[i + 1]) : 
  not defined columns selected

Moreover: There are 50 and more alerts (use warnings() to read the first 50)

Timing stopped at: 32.23 0.11 32.69 

warnings() #let's check
>: In chisq.test(dati[, i], dati[, i + 1]) :
  Chi-squared approximation may be incorrect

chisquare #all the cells (unless in the last column which seems to have the p-values) have the same values by row

I also tried another way, which were provided me by someone who knows how to manage R much better than me:

#strange values I have in some columns
sum(dati == 'x')

#replacing "x" by x
x <- dati[dati=='x']

#distribution of answers for each question
answers <- t(sapply(1:ncol(dati), function(i) table(factor(dati[, i], levels = -2:9), useNA = 'always')))

rownames(answers) <- colnames(dati)
answers
#correlation for the pairs

I<- diag(ncol(dati)) 
#empty diagonal matrix

colnames(I) <- rownames(I) <- colnames(dati)
rn <- rownames(I)
cn <- colnames(I)

#loop
system.time({
    for(i in 1:ncol(dati)){
        for(j in 1:ncol(spain)){
            a <- dati[, rn[i]]
            b <- dati[, cn[j]]
            r <- chisq.test(a,b)$statistic
            r <- chisq.test(a,b)$p.value
            I[i, j] <- r
        }
     }
})

 user  system elapsed 
  29.61    0.09   30.70 

There are 50 and more alerts (use warnings() to read the first 50)

warnings() #let's check
-> : In chisq.test(a, b) : Chi-squared approximation may be incorrect

diag(I)<- 1

#result
head(I)

The columns stop at the 5th variable, whereas I need to check the dependency between all the variables. Each one.

I don't understand where I'm wrong, but I hope I'm not so far...

I hope to receive a good help, please.

joran
  • 169,992
  • 32
  • 429
  • 468
Andrea
  • 41
  • 4

1 Answers1

1

You are apparently trying to compute the p-value of a chi-squared test, for all pairs of variables in your dataset. This can be done as follows.

# Sample data
n <- 1000
k <- 10
d <- matrix(sample(LETTERS[1:5], n*k, replace=TRUE), nc=k)
d <- as.data.frame(d)
names(d) <- letters[1:k]

# Compute the p-values
k <- ncol(d)
result <- matrix(1, nr=k, nc=k)
rownames(result) <- colnames(result) <- names(d)
for(i in 1:k) {
  for(j in 1:k) {
      result[i,j] <- chisq.test( d[,i], d[,j] )$p.value
  }
}

In addition, there may be something wrong with your data, leading to the warnings you get, but we do not know anything about it.

Your code has too many problems for me to try to enumerate them (you start to try to create a square matrix with a different number of rows and columns, and then I am completely lost).

Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78