0

I'm trying to construct a 2x2 contigency table as explained in the following link below:

Ad hoc 2x2 contingency tables SQL Server 2008 (tried to understand the code but couldn't wrap my head around it)

A loop is made to construct pairs as in C1,C1 C1,C2 C2,C1 C2,C2. (cartesian product)

These pairs are given as argument into the sql code. For this example I already have given the sql code a pair --> C1,C1

When constructing it for different pairs they are correct as in C1,C2 C2,C1 (after some modifications explained below). When making pairs of C1,C1 or C2,C2 it constructs a wrong contingency table.

For example (table name is alpha_occurence):

id   concept_uri   document_uri

1       C1      D1

2       C2      D1

2x2 Contingency table of the pairs C1,C1 should give from the given table above:

       C1     not C1
    C1  1     0
not C1  0     -

But instead gives (after some modifications):

       C1    not C1
    C1  0    1
not C1  1    -

note I have put a - for the value not C1, Not C1. Because to calculate that an other method is used.

This sql code is used to retrieve the values :

SELECT count(*) AS total FROM  
(SELECT document_uri,count(DISTINCT concept_uri) AS count_conc FROM mydb.alpha_occurence 
WHERE concept_uri IN ('C1','C1') 
GROUP BY document_uri 
HAVING count_conc >=2 ) 
AS amount_of_concept_co_occurence #value of both X and Y

UNION ALL 

SELECT count(*) AS total FROM 
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_A #value of Only X not Y

UNION ALL 

SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence 
WHERE concept_uri IN ('C1'))
AS only_concept_B #value of Not X only Y

After the values are retrieved a little script is run over these values to correct them. The following is done:

To get Only X and not Y            = only_concept_A - amount_of_concept_co_occurence 
To get Not X and Only Y            = Only_concept_B - amount_of_concept_co_occurence
To get the value of neither X or Y = total # of documents (which is not given here as the sample data only has data of which concept occurce in which document) - (amount_of_concept_co_occurence + Only X and not Y + Not X and Only Y) 
Community
  • 1
  • 1
user3707396
  • 1
  • 1
  • 2
  • Please edit your question and who samples of data. You have non-sensical constructs like `not in ('c1', 'c1')` and mention "A" and "B", but have no such references in the query. – Gordon Linoff Jun 05 '14 at 11:08
  • I have edited my question and hope i made it somewhat more understandable – user3707396 Jun 06 '14 at 14:19
  • Your question would be much clearer if you just showed some sample data and the results that you want. I *know* what a contingency table is. I *don't know* what your data looks like. You might also consider closing this question and asking another one that better communicates what you are trying to accomplish with your data. – Gordon Linoff Jun 06 '14 at 14:35

1 Answers1

1

I used this script

select concept_uri, document_uri, count(*) as count 
from table
group by concept_uri, document_uri

and they are ready..

Alexei - check Codidact
  • 22,016
  • 16
  • 145
  • 164
  • Welcome to Stack Overflow! While you may have solved this user's problem, code-only answers are not very helpful to users who come to this question in the future. Please edit your answer to explain why your code solves the original problem. – Joe C Feb 12 '17 at 17:42