I have a database defined as follows:
create table Classes (
Id INT not null,
Text NVARCHAR(255) null,
primary key (Id)
)
create table Documents (
Id INT not null,
Title NVARCHAR(MAX) null,
Abstract NVARCHAR(MAX) null,
Year INT null,
primary key (Id)
)
create table Documents_Tokens (
DocumentFk INT not null,
TokenFk INT not null
)
create table Documents_Classes (
DocumentFk INT not null,
ClassFk INT not null
)
create table Tokens (
Id INT not null,
Text NVARCHAR(255) null,
primary key (Id)
)
There is a m:m relationship between documents and classes and documents and tokens.
I would like to determine certain stats. One stat is A, which measures the co-occurrence of classes and tokens. I currently determine this stat like this:
with combs as
(
select
a.Id as classid,
a.text as class,
b.Id as tokenid,
b.text as token
from dbo.Classes as a
cross join dbo.Tokens as b
)
,A as
(
select token, class, count(distinct DocumentFk) as A from
(
select
token,
class,
DocumentFk
from combs
inner join dbo.Documents_Classes on classid = ClassFk
group by token, DocumentFk, class
intersect
select
token,
class,
DocumentFk
from combs
inner join dbo.Documents_Tokens on tokenid = tokenFk
group by token, DocumentFk, class
) T group by token, class
)
...
Unfortunately, this query takes ages (I have added indexes after running the query analyser). Is this the most efficient way to determine A? If not is there a better way? I could also change the underlying database structure to potentially speed things up ...
Any feedback would be very much appreciated.