I'm trying to construct a 2x2 contigency table as explained in the following link below:
Ad hoc 2x2 contingency tables SQL Server 2008 (tried to understand the code but couldn't wrap my head around it)
A loop is made to construct pairs as in C1,C1 C1,C2 C2,C1 C2,C2. (cartesian product)
These pairs are given as argument into the sql code. For this example I already have given the sql code a pair --> C1,C1
When constructing it for different pairs they are correct as in C1,C2 C2,C1 (after some modifications explained below). When making pairs of C1,C1 or C2,C2 it constructs a wrong contingency table.
For example (table name is alpha_occurence):
id concept_uri document_uri
1 C1 D1
2 C2 D1
2x2 Contingency table of the pairs C1,C1 should give from the given table above:
C1 not C1
C1 1 0
not C1 0 -
But instead gives (after some modifications):
C1 not C1
C1 0 1
not C1 1 -
note I have put a - for the value not C1, Not C1. Because to calculate that an other method is used.
This sql code is used to retrieve the values :
SELECT count(*) AS total FROM
(SELECT document_uri,count(DISTINCT concept_uri) AS count_conc FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1','C1')
GROUP BY document_uri
HAVING count_conc >=2 )
AS amount_of_concept_co_occurence #value of both X and Y
UNION ALL
SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_A #value of Only X not Y
UNION ALL
SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_B #value of Not X only Y
After the values are retrieved a little script is run over these values to correct them. The following is done:
To get Only X and not Y = only_concept_A - amount_of_concept_co_occurence
To get Not X and Only Y = Only_concept_B - amount_of_concept_co_occurence
To get the value of neither X or Y = total # of documents (which is not given here as the sample data only has data of which concept occurce in which document) - (amount_of_concept_co_occurence + Only X and not Y + Not X and Only Y)