0

I have a table with 3 columns: id, sentence and language. So sentences can be either in English and German, IDs are assigned to sentences with same meaning but different languages, like

ID | sentence | language
1  | Hello    | en
1  | Hallo    | de
2  | Sorry    | en

there could be sentences that exist only in one language. Now I want to find out all sentences that are available in both language, I can do with:

SELECT 
    *
FROM
    `sentences`
WHERE
    LENGTH(sentence) > 0
        AND (language = 'en' OR language = 'de')
GROUP BY id
HAVING COUNT(language) = 2

and I get results of sentences only in German. Then I do

SELECT 
    *
FROM
    sentences
WHERE
    id IN (SELECT 
            id
        FROM
            `sentences`
        WHERE
            LENGTH(sentence) > 0
                AND (language = 'en' OR language = 'de')
        GROUP BY id
        HAVING COUNT(language) = 2)

That should work but query takes forever. My question: is there any fancy way to do this?

Paul Stanley
  • 4,018
  • 6
  • 35
  • 56
boreas
  • 1,041
  • 1
  • 15
  • 30

2 Answers2

2

INNER JOINS are faster than using IN clause

SELECT en.id, 
       en.sentence as en_sentence,
       de.sentence as de_sentence,
       en.language as en_language,
       de.language as de_language
FROM sentences en
INNER JOIN sentences de ON en.ID = de.ID AND en.language = 'en' AND de.language = 'de'
WHERE length(en.sentence) > 0
AND length(de.sentence) > 0
Tin Tran
  • 6,194
  • 3
  • 19
  • 34
1

Delete your sentences that are of length 0, if your data permits. Back up before running:

DELETE FROM sentences WHERE LENGTH(SENTENCE) = 0

take out the select *, and get everything you want. If you have no indexes, add a combined index on language and id.

This leaves you with

SELECT 
    ID, sentence, language.
FROM
    `sentences`
WHERE
    language = 'en' OR language = 'de'
GROUP BY id
HAVING COUNT(language) = 2
Paul Stanley
  • 4,018
  • 6
  • 35
  • 56
  • nice suggestion for deleting empty data. But grouping by ID will lose sentence and language for the other language – Tin Tran May 30 '16 at 20:16
  • You'd want that to happen for the having count. The trade off between mine and yours is I can grow mine with more languages and upping the count and changing the language to an IN, without changing the code, however I have to do a COUNT. – Paul Stanley May 30 '16 at 20:32
  • but the code has to handle languages anyways...with this GROUP BY, there's no guarantee what language you're getting it just picks one – Tin Tran May 30 '16 at 20:37
  • Exactly, and the having clause deals with that. Id assume the language and id would have a unique index. – Paul Stanley May 30 '16 at 20:39
  • no what i mean is you could get a sentence in english and another sentence in german, so there's no way to use that data...other than id, you might as well just select id alone and then have to do another query to grab the language you need – Tin Tran May 30 '16 at 20:41