-1

I have a table containing Japanese text, in which I believe that there are some duplicate rows. I want to write a SELECT query that returns all duplicate rows. So I tried running the following query based on an answer from this site (I wasn't able to relocate the source):

SELECT [KeywordID], [Keyword]
FROM Keyword
WHERE [Keyword] IN (SELECT [Keyword]
FROM [Keyword] GROUP BY [Keyword] HAVING COUNT(*) > 1);

The problem is that Access' equality operator treats the two Japanese writing systems - hiragana and katakana - as the same thing, where they should be treated as distinct. Both writing systems have the same phonetic value, although the written characters used to represent the sound are different - e.g. あ (hiragana) and ア (katakana) both represent the sound 'a'.

When I run the above query, however, both of these characters will appear, as according to Access, they're the same character and therefore a duplicate. Essentially it's a case-insensitive search where I need a case-sensitive one.

I got around this issue when doing a simple SELECT to find a Keyword using StrComp to perform a binary comparison, because this method correctly treats hiragana and katakana as distinct. I don't know how I can adapt the query above to use StrComp, though, because it's not directly evaluating one string against another as in the linked question.

Basically what I'm asking is: how can I do a query that will return all duplicates in a table, case-sensitive?

Lou
  • 2,200
  • 2
  • 33
  • 66

2 Answers2

1

You can use exists instead:

SELECT [KeywordID], [Keyword]
FROM Keyword as k
WHERE EXISTS (SELECT 1
              FROM Keyword as k2
              WHERE STRCOMP(k2.Keyword, k.KeyWord, 0) = 0 AND
                    k.KeywordID <> k2.KeywordID
             );
Erik A
  • 31,639
  • 12
  • 42
  • 67
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Just to note, should be `StrComp`, not `StrCmp`. I also assumed you had meant to compare k2.Keyword and k.Keyword, rather than KeywordID and Keyword. I tried the query both ways (comparing Keyword + Keyword and KeywordID + Keyword), but both queries return values such as `いそいそ` and `イソイソ` as duplicates (which have the same sound value but are obviously written using different characters.) – Lou Dec 15 '19 at 12:43
  • Thanks for the edit from Erik A and the typo correction, but this query still doesn't work for my purposes unfortunately. – Lou Dec 15 '19 at 13:21
  • @Lou Can you expand on _doesn't work_? One of the problems with `StrComp` is it does partial string matching, which, if that's your problem, can be fixed easily by either checking string length too or using `StrComp` twice with the arguments inverted – Erik A Dec 15 '19 at 13:35
  • @ErikA - I want to run a query that returns only values which are exact duplicates. For example if there are two records あ and あ, those should be returned as they're the same set of characters twice. Currently the query in my post and in Gordon Linoff's answer returns results that are not exact duplicates, such as あ and ア, because these are being considered as equal even though they're different characters. I don't think partial string matching is the issue, because e.g. イソイソ and いそいそ contain none of the same characters, but are both being returned in the query as if they're duplicates. – Lou Dec 15 '19 at 13:41
  • If this was unclear, basically the query is returning "a" and "A" and calling them duplicates where I want a query which only considers "a" and "a" to be duplicates, or "A" and "A". – Lou Dec 15 '19 at 13:51
  • @Lou . . . If MS Access is returning a *binary* comparison with different characters being equal, then something seems wrong. – Gordon Linoff Dec 15 '19 at 14:55
  • Brilliant. Per Erik's last edit, this now works just as intended, only returning exact duplicates. Thanks both! – Lou Dec 15 '19 at 15:21
0

Try with a self join:

SELECT k1.[KeywordID], k1.[Keyword], k2.[KeywordID], k2.[Keyword]
FROM Keyword AS k1 INNER JOIN Keyword AS k2
ON k1.[KeywordID] < k2.[KeywordID] AND STRCOMP(k1.[Keyword], k2.[Keyword], 0) = 0
forpas
  • 160,666
  • 10
  • 38
  • 76
  • Trying this at the moment, Access is currently frozen, probably as the Keyword table is massive. Will let you know if it works :) – Lou Dec 15 '19 at 13:50