How to calculate and create an index to represent the values of other columns?

Question

Please, could anyone help me implement the calculation outlined below.

I'm using R in RStudio.

df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,0,11,12,13,14,15,16,17,18,19,20),
             total_fatal_injuries = c(1,0,5,4,0,27,10,15,6,2,10,4,0,0,1,0,3,0,1,0),
             total_serious_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,2),
             total_minor_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,3),
             total_uninjuried = c(1,0,1,0,0,10,2,5,0,4,0,0,31,0,2,3,0,1,0,0),
             injured_index = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))

In the data set above, each line represents an observation of the occurrence of accidents with vehicles.

Column 'x' is just an ID.

The same occurrence may have individuals with various levels of injury: fatal injuries, serious injuries, minor injuries and uninjured. The sum of the values of each column is equal to the number of individuals involved in the occurrence.

The goal is to populate the 'injured_index' column with a value that represents the severity of the occurrence, according to the values recorded in the other columns.

A numerical index that represents the severity of the occurrence, by which the data set can be ordered.

What would be the best formula for calculating the 'injured_index' column?

I would like someone to make a suggestion on how to calculate a value for an index that represents the level of how bad the occurrence is. Based on the total number of victims at each level, per occurrence.

The importance is simple to understand.

1) Fatal is bad
2) Serious is a bit less bad
3) Minor is not good
4) Uninjured is ideal.

How to put everything together mathematically and get an index that represents which occurrence is more or less serious than the other?

I know how to create the column and assign a value. I just want the hint of how to calculate the value that will be stored.

I know this has more to do with math, but mathematicians in the Mathematics Stack Exchange refuse to answer because they think it does not have mathematics but programming. :/

Thank you all for trying!

Not clear about the expected output. Please use `dput` to show a small example instead of image along with the expected output — akrun, Oct 23 '18 at 15:36
@AnderOak Welcome at SO. Please don't get frustrated if your question gets downvoted but improve your question, e.g. better headline than "situation below", don't ask for "best way" but concrete problems ("best" is relative and depends on your preconditions) and always give input data, the expected output and the R code that you have but that is not working as expected. THX :-) — R Yoda, Oct 23 '18 at 16:34
@RYoda, thank you very much for the directions. I will pay attention to do better next time. :) — AnderOak, Oct 23 '18 at 19:08

score 1 · Accepted Answer · answered Oct 23 '18 at 17:33

1

Here's an approach.

# This counts how many people in each row, for columns 2 through 5
df$count <- rowSums(df[,2:5])

# This assigns a weighting to each severity of injury and divides by how
#   many people in that row. Adjust the weights based on your judgment.
df$injured_index = (1000 * df$total_fatal_injuries + 200 * 
  df$total_serious_injuries + 20 * df$total_minor_injuries) / df$count

answered Oct 23 '18 at 17:33

Jon Spring

55,165
4
35
53

Jon, your approach was very similar to what I tried. I just used smaller weights. And maybe that's why I got results that did not look right. Thank you for your time and attention! – AnderOak Oct 23 '18 at 19:16

How to calculate and create an index to represent the values of other columns?

1 Answers1