I have a table crashes
with around one million rows, each row containing data on either:
every car crash that did not occur near a school, or
every car crash that occurred near a school, with additional rows per crash if it occurred near more than one school (e.g. 4 rows for a crash near 4 schools). The highest number of rows/nearby schools for one crash is 10.
I'd like to add a column to the table returning a "1" for only one appearance of every crash_id that appears in more than one row, and a "0" for any subsequent occurrences of that same crash_id in the column crash_id
. Which row has a 1 or 0 per crash_id doesn't matter.
I've tried all of the suggestions offered in response to this similar question, but I couldn't get any of them to work for me.
FWIW, I got this to work in Excel using this formula:
=(COUNTIF($C$2:$C2,$C2)=1)+0
But that was for a small table, not one with one million rows.
What I've tried so far:
SELECT *
FROM
(
SELECT * , ROW_NUMBER() OVER(PARTITION BY crash_id) AS row
FROM crashes
) AS A1
WHERE row <6
SELECT *
FROM
(
SELECT * , ROW_NUMBER() OVER(PARTITION BY crash_id) AS row
FROM crashes
) AS A1
WHERE row = 1
I understand this isn't optimal database design, but it allows me to get most of what I need, except for what I'm describing above.