1

I am working on some car accident data and am stuck on how to get the data in the form I want.

select
    sex_of_driver,
    accident_severity,
    count(accident_severity) over (partition by sex_of_driver, accident_severity) 
from
    SQL.dbo.accident as accident
    inner join SQL.dbo.vehicle as vehicle on
        accident.accident_index = vehicle.accident_index

This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.

However I get a very large table (I assume for each row that is each sex/severity. When I do the following:

select
    sex_of_driver,
    accident_severity,
    count(accident_severity) over (partition by sex_of_driver, accident_severity) 
from
    SQL.dbo.accident as accident
    inner join SQL.dbo.vehicle as vehicle on
        accident.accident_index = vehicle.accident_index
group by
    sex_of_driver,
    accident_severity

I get this:

sex_of_driver accident_severity (No column name)
1 1 1
1 2 1
-1 2 1
-1 1 1
1 3 1

I won't give you the whole table, but basically, the group by has caused the count to just be 1.

I can't figure out why group by isn't working. Is this an MS SQL-Server thing?

I want to get the same result as below (obv without the CASE etc)

select
    accident.accident_severity,
    count(accident.accident_severity) as num_accidents,
    vehicle.sex_of_driver,
    CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
    CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
    SQL.dbo.accident as accident
    inner join SQL.dbo.vehicle as vehicle on
        accident.accident_index = vehicle.accident_index
where
    sex_of_driver != 3
    and
    sex_of_driver != -1

group by
    accident.accident_severity,
    vehicle.sex_of_driver

order by
    accident.accident_severity

1 Answers1

0

You seem to have a misunderstanding here.

GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.

Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.

You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.

Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id

So you want something like this:

select
    sex_of_driver,
    accident_severity,
    count(*) as Count,
    count(*) * 1.0 /
      sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
    count(*) * 1.0 /
      sum(count(*)) over () as PercentOfTotal
from
    dbo.accident as accident a
    inner join dbo.vehicle as v on
        a.vehicle_id = v.vehicle_id
group by
    sex_of_driver,
    accident_severity;
Charlieface
  • 52,284
  • 6
  • 19
  • 43