1

SPSS treats multi response sets differently from categorical vars when it comes to z-tests in custom tables. I assume that this behaviour is linked to the overlap of the responses, but I cannot figure out how.

So, how does SPSS do z-testing when it comes to multi response sets (MRsets)?

My goal is to reproduce SPSS z-test for MRsets in R, but I cannot figure out what SPSS actually does. Normally, SPSS custom table z-testing is just the same as

prop.test(c(proportion1,proportion2),c(columnSum1,columSum2),"two.sided",correct=F)

but it is different with MRsets, obviously.

To make this clear, please take a look at this categorical versus MRset comparison.


Categorical var z-test (C & D columns are not different according to z-test)

  • Categorical dataset (no overlap, 3623 cases): Download dataset
  • Categorical overlap matrix (no overlap): enter image description here
  • Categorcial z-test SPSS syntax

    CTABLES
      /VLABELS VARIABLES=splitVar catVar DISPLAY=DEFAULT
      /TABLE splitVar [C][COUNT F40.0] BY catVar [C]
      /CATEGORIES VARIABLES=splitVar catVar ORDER=A KEY=VALUE EMPTY=EXCLUDE
      /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=NO CATEGORIES=ALLVISIBLE.
    
  • Categorcial z-test output: enter image description here

  • R reproduction for C<->D z-test (first row): http://www.r-fiddle.org/#/fiddle?id=p4gw9ftk

    "Categorical var z-test"
    "Doing a proportions test for first row (splitVar=1) and columns C and D"
    prop.test(c(198,242), c(198+35,242+65), alternative="two.sided", correct=F )
    "As we see, there are no significant differences in the proportions on an alpha=0.05 level"
    

MRset z-test (identical numbers in table, but different z-test result: significant differences in C & D columns)

  • MRset dataset (overlaps included, 2404 cases): Download dataset
  • MRset overlap matrix: enter image description here
  • MRset z-test output:enter image description here
  • MRset z-test SPSS syntax:

    CTABLES
        /VLABELS VARIABLES=splitVar $MySet DISPLAY=DEFAULT
        /TABLE splitVar [C] BY $MySet [C][COUNT F40.0]
        /CATEGORIES VARIABLES=splitVar ORDER=A KEY=VALUE EMPTY=EXCLUDE
        /CATEGORIES VARIABLES=$MySet  EMPTY=INCLUDE
        /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE.
    
  • R reproduction for C<->D z-test (first row): http://www.r-fiddle.org/#/fiddle?id=GAhnnrv0

    "MRset z-test"
    "Doing a proportions test for first row (splitVar=1) and columns C and D"
    overlap_splitvar1_CD <- 53
    overlap_splitvar2_CD <- 9
    prop.test(c(198-overlap_splitvar1_CD,242-overlap_splitvar1_CD), c(198+35-overlap_splitvar1_CD-overlap_splitvar2_CD,242+65-overlap_splitvar1_CD-overlap_splitvar2_CD), alternative="two.sided", correct=F )
    "As we see, there are still no significant differences in the proportions on an alpha=0.05 level. In contrast, SPSS does detect a difference. Why?"
    

As you can see from the MRset R code, even a subtraction of overlap cases does not help. Maybe it is linked to weighting or something? Thanks so much for ideas.

Possibly helpful link: A Note on Weights and Multiple Response Sets

nilsole
  • 1,663
  • 2
  • 12
  • 28
  • A user at CrossValidated has given a potentially useful answer, hinting at the SPSS algorithms document: http://stats.stackexchange.com/questions/163712/custom-tables-how-does-spss-treat-multi-response-sets-differently-from-categori?noredirect=1#comment311297_163712 – nilsole Jul 29 '15 at 11:14
  • I am working on this and report back once I found a solution. – nilsole Jul 29 '15 at 11:34

1 Answers1

0

Could it be the Bonferroni adjustment that is being applied by CTABLES potentially?

Jignesh Sutar
  • 2,909
  • 10
  • 13
  • Good thought, but such a p-value correction is never indicated by SPSS and is also set to "NONE" in the SPSS syntax in both cases. – nilsole Jul 28 '15 at 19:13