2

Given data such as:

        A         B           C
1               Group 1     Group 2
2   Property 1     56         651
3   Property 2     97       1,380

how can one calculate the p-value (i.e., the "right-tail" probability of the chi-squared distribution) directly, without setting up a separate calculation for the expected values of the table?

The p-value is calculated in Excel by the function ChiSq.dist.RT if you know the value of chi-squared for the table or by ChiSq.Test if you know the table of "expected values" for the table. The value of chi-squared is calculated using the expected values, and the expected values are calculated from the original table by a somewhat complicated formula, so either way, Excel requires us to calculate the expected values ourselves in order to get the p-value, which seems kind-of silly. So, how can I get the p-value in Excel without calculating the expected values separately?

Edit: This question was originally posted with the title "How to calculate Pearson correlation coefficient with 2-property arrays?" and asked why the function pearson was giving the wrong answer. Well, the answer to that was that I was confusing p-value with the Pearson correlation coefficient, which are different things. So I've reformulated the question to ask what I really needed to know, and am posting the answer. I'll wait a while before accepting my own answer in case someone else has a better one.

NewSites
  • 1,402
  • 2
  • 11
  • 26

1 Answers1

2

It appears to me that this requires VBA. I've written the following VBA function to calculate either chi-squared or the p-value, as well as two other measures of association for a 2x2 contingency table:

Public Function nStatAssoc_2x2(sType As String, nGrp1PropCounts As Range, nGrp2PropCounts As Range) As Single

' Return one of several measures of statistical association of a 2×2 contingency table:
'                   Property 1      Property 2
'       Group 1     nCount(1, 1)    nCount(1, 2)
'       Group 2     nCount(2, 1)    nCount(2, 2)

' sType is:     to calculate:
'   "OR"        Odds ratio
'   "phi"       Phi coefficient
'   "chi-sq"    Chi-squared
'   "p"         p-value, i.e., right-tailed probability of the chi-squared distribution

' nGrp<n>PropCounts is a range of two cells containing the number of members of group n that have each of two properties.
' These arguments are 1-D arrays in order to allow the data to appear in non-adjacent ranges in the spreadsheet.

' References:
    ' Contingency table:        https://en.wikipedia.org/wiki/Contingency_table
    ' Measure of association:   www.britannica.com/topic/measure-of-association
    ' Odds ratio:               https://en.wikipedia.org/wiki/Odds_ratio
    '                           https://en.wikipedia.org/wiki/Effect_size#Odds_ratio
    ' Phi coefficient:          https://en.wikipedia.org/wiki/Phi_coefficient
    ' Chi-sq:                   https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Calculating_the_test-statistic
    '                           www.mathsisfun.com/data/chi-square-test.html
    '                               Shows calculation of expected values.
    ' p-value:                  https://learn.microsoft.com/en-us/office/vba/api/excel.worksheetfunction.ChiSq_Dist_RT

Dim nCount(1 To 2, 1 To 2) As Integer
Dim nSumGrp(1 To 2) As Integer, nSumProp(1 To 2) As Integer, nSumAll As Integer
Dim nExpect(1 To 2, 1 To 2) As Single
Dim nIndex1 As Byte, nIndex2 As Byte
Dim nRetVal As Single

' Combine input arguments into contingency table:
For nIndex1 = 1 To 2
    nCount(1, nIndex1) = nGrp1PropCounts(nIndex1)
    nCount(2, nIndex1) = nGrp2PropCounts(nIndex1)
  Next nIndex1

' Calculate totals of group counts, property counts, and all counts (used for phi and chi-sq):
For nIndex1 = 1 To 2
    For nIndex2 = 1 To 2
        nSumGrp(nIndex1) = nSumGrp(nIndex1) + nCount(nIndex1, nIndex2)
        nSumProp(nIndex2) = nSumProp(nIndex2) + nCount(nIndex1, nIndex2)
      Next nIndex2
  Next nIndex1
nSumAll = nSumGrp(1) + nSumGrp(2)

If nSumAll <> nSumProp(1) + nSumProp(2) Then
    nRetVal = -2           ' Error: Sums differ.
    GoTo Finished
  End If

Select Case sType

    ' Odds ratio
    Case "OR":
        nRetVal = (nCount(1, 1) / nCount(1, 2)) / (nCount(2, 1) / nCount(2, 2))
        If nRetVal <> (nCount(1, 1) / nCount(2, 1)) / (nCount(1, 2) / nCount(2, 2)) Then
            nRetVal = -3            ' Error: OR calculation results differ.
            GoTo Finished
          End If

    ' Phi coefficient
    Case "phi":
        nRetVal = ((CLng(nCount(1, 1)) * nCount(2, 2)) - (CLng(nCount(1, 2)) * nCount(2, 1))) / _
                    (CSng(nSumGrp(1)) * nSumGrp(2) * nSumProp(1) * nSumProp(2)) ^ 0.5

    ' Chi-squared
    Case "chi-sq", "p":     ' For "p", nRetVal is passed to the next select case statement.
        ' Calculate table of expected values:
        For nIndex1 = 1 To 2
            For nIndex2 = 1 To 2
                    ' In next line, the division is done first to prevent integer overflow,
                    '   which can happen if the multiplication is done first.
                nExpect(nIndex1, nIndex2) = nSumGrp(nIndex1) / nSumAll * nSumProp(nIndex2)
                If nExpect(nIndex1, nIndex2) < 5 Then
                    ' https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Assumptions
                    nRetVal = -4        ' Error: Expected value too small.
                    GoTo Finished
                  Else
                    nRetVal = nRetVal + _
                        (nCount(nIndex1, nIndex2) - nExpect(nIndex1, nIndex2)) ^ 2 / nExpect(nIndex1, nIndex2)
                  End If
              Next nIndex2
          Next nIndex1

    Case Else:
        nRetVal = -1           ' Error: Invalid measure type.
        GoTo Finished
  End Select

Select Case sType
    Case "OR", "phi", "chi-sq":

    ' p-value       ' Uses value of nRetVal passed from the previous select case statement.
    Case "p": nRetVal = WorksheetFunction.ChiSq_Dist_RT(nRetVal, 1)
  End Select

Finished: nStatAssoc_2x2 = nRetVal

End Function        ' nStatAssoc_2x2()

The function is tested in Excel 2019 and yields correct values for all four measures for several test tables. Criticisms of or suggestions for improving the code are welcome.

If I'm wrong, and this doesn't require VBA or for any other reason there's a better way to do this, please post a different answer with that. As I said in the edit note in my question, I'll wait a while before accepting my answer to see if someone else has a better one.

NewSites
  • 1,402
  • 2
  • 11
  • 26
  • Any chance to apply Yates' continuity correction [here](https://stackoverflow.com/questions/76552046/chi-squared-test-with-yates-correction-in-excel)? Especially for p-value calculation. – bajun65537 Jun 26 '23 at 15:54