0

I've been searching for an easy solution to a pretty trivial problem. I have an huge set of records (~120,000) that I need to screen for duplicates, assign a sequential number to each set of duplicates, like Assign# below: enter image description here

Eventually, I am trying to achieve this: enter image description here

I use P1, P2, and P3 fields as a set of sort parameters in query (ascending/descending) to determine the best/top Name for each set of identical NCBI hits. I tried a lot of things already and my main problem is that access freezes half way through and I don't really know if the script is functional.

FROM [sortquery] 
WHERE ((([sortquery].Name) In 
(
    SELECT TOP 1 [sortquery].Name 
    FROM [sortquery] AS Dupe 
    WHERE Dupe.NCBI=[sortquery].NCBI 
    ORDER BY Dupe.NCBI
))) 
ORDER BY [sortquery].NCBI;

I am open to any suggestion and corrections! Thanks for any help =)

Siyual
  • 16,415
  • 8
  • 44
  • 58
Shtork
  • 1
  • 1

1 Answers1

1

The traditional method is to count:

SELECT 
   *,
   (Select Count(*)
   From Sortquery As S
   Where S.NCBI = Sortquery.NCBI 
       And S.P1 * 1000 + S.P3 >= Sortquery.P1 * 1000 + Sortquery.P3) As [Assign#]
FROM 
    [sortquery] 
ORDER BY
    NCBI Asc,
    P1 Desc,
    P3 Desc,
    [Name] Asc,
    [Assign#] Asc
Gustav
  • 53,498
  • 7
  • 29
  • 55
  • Thank Gustav, but count option doesn't work for some reason. I am starting to think that maybe I am setting the count(*) to a wrong field. I used your suggested script and the result had most of the fields marked "1". The only places that had "2" where actual full record duplicates (all 5 fields); whereas, My goal is to basically to count each set of "NCBI" field duplicates. – Shtork Feb 11 '16 at 14:06
  • You are right. See edit with NCBI only and relating to P1 and P3. – Gustav Feb 11 '16 at 14:32
  • I tried that too ))) it only assigns the total count to each dup record. I think there has to be an autonumber function for each set of duplicate records, counting duplicate data just comes back with totals instead of sequential order per duplicate set of records. – Shtork Feb 11 '16 at 15:44
  • The count will be 1, 2, 3 and will thus be the assign#. – Gustav Feb 11 '16 at 16:18