-1

I have two tables in my database. The first one contains keywords, the other one contains longer text.

For example:

Table 1 has the following entries in a column:

word1
word2
word3

Table 2 has the following values in a column:

This entry has word1
This entry has word2
This entry has word1 and word3
This entry has word1 and word1

I want to write a query that tells me how many entries in table 2 have words from table 1 in them.

The result I am looking for is like:

word1 > 3
word2 > 1
word3 > 1

What I am currently doing is a cross join and then using like to try and get the results but it is extremely slow.

select * from (
select uk.keyword,  lt.content from UniqueKeywords uk
cross join LongText) lt

where lt.content like '%'+keyword+'%'

EDIT: If you can help me give me an idea on how to do this in a PDI transformation that would be swell as well....

ghengalala
  • 109
  • 9
  • 1
    You may wish to try [full-text index](https://learn.microsoft.com/en-us/sql/relational-databases/search/full-text-search?view=sql-server-ver15) – Serg Nov 19 '20 at 07:38
  • Databases have tables with rows and fields, not lists. What you try to do *is* extremely slow because no indexes can be used *UNLESS* you use specialized full-text search indexes and eg [the CONTAINS function](https://learn.microsoft.com/en-us/sql/t-sql/queries/contains-transact-sql?view=sql-server-ver15) instead of LIKE – Panagiotis Kanavos Nov 19 '20 at 07:39
  • And what would `'word1word3'` count as? – Gordon Linoff Nov 19 '20 at 13:11
  • @GordonLinoff This should not count towards the word count – ghengalala Nov 20 '20 at 04:27

1 Answers1

0

SQL is terrible at this kind of thing, on 2016+ you can at least try STRING_SPLIT, if the sentences are complex you'd probably use Python regex in 2017+.

That said, assuming you're on 2016+ you can try the below mockup which gets the numbers, you can wrap in another cte for the counts. Under that would be the syntax for whatever your table names are.

But this assumes all kinds of things, sentences having spaces and no other punctuation to affect joins being the beginning.

--proof of concept
WITH cteSingleWords AS
(
    SELECT LookingFor   = 'word1' UNION ALL
    SELECT LookingFor   = 'word2' UNION ALL
    SELECT LookingFor   = 'word3'

),
cteLongSentence AS
(
    SELECT Id = 1   ,LongSentence = 'This entry has word1'                UNION ALL
    SELECT Id = 2   ,LongSentence = 'This entry has word2'                UNION ALL
    SELECT Id = 3   ,LongSentence = 'This entry has word1 and word3'      UNION ALL
    SELECT Id = 4   ,LongSentence = 'This entry has word1 and word1'
)
SELECT 
    Id 
    ,ss.Value
    ,sw.LookingFor
FROM 
    cteLongSentence
    CROSS APPLY STRING_SPLIT(LongSentence ,' ')  ss
    INNER JOIN cteSingleWords sw ON sw.LookingFor = ss.Value



--replace with table names
SELECT
     ss.Value
    ,swtn.WordImLookingFor
FROM
    LongSentenceTableName                                       lstn
    CROSS APPLY STRING_SPLIT(lstn.LongSentenceColumnName ,' ')  ss
    INNER JOIN SingleWordTableName                              swtn    ON swtn.WordImLookingFor = ss.Value