Growth rate in student abilities

Question

I am struggling with writing code to calculate and then plot the growth rate. My data frame df looks like this

ID  Jan_Score  Dec_Score  Cluster
A   0          5          1
B   19         14         2
F   13         21         3
D   12         10         2
M   27         33         4
P   54         54         4

The scores column in the above data frame reflects students' scores in an abilities test. Therefore, zero (0) does not mean a lack of that ability, it just means that those students could not provide sufficient evidence to demonstrate that ability.
The values in the scores column are, therefore, never negative.
Also, when students' assessments show identical scores in January (at the start of the year) and December (at the end of the year), it does not mean that students haven't grown in their capabilities because the content on which assessment was done in January is lesser than the content in December. Therefore, the growth cannot be Infinity (such as for ID A when January score is zero) or negative (such as for ID D when January score is higher than the December score).

My question is, how can we calculate (and if possible plot) the growth per student ID and then per cluster?

Any help would be greatly appreciated.

Partial solution

I am using the following formula for calculating growth per person (i.e., per ID)

df$growth = (df$Dec_Score - df$Jan_Score) / df$Jan_Score

But this formula needs to be changed to accommodate for the corner cases such as:
1. When Jan_Score is zero, it replaces the Jan_Score value of 0 with 0.1 (that's an arbitrary decision that I took).
1. Address the cases when Dec_score is less than the Jan_Score. Perhaps add an offset value in all Dec_Score so that they are always more than Jan_Score. If the maximum value is 54 and the minimum value is 0, what could be a good offset value to be added in Dec_Score?

Any help would be greatly appreciated!

The following posts are related but do not address my problem:

How to calculate growth with a positive and negative number?,

How to calculate percentage when old value is ZERO,

what is my increment percentage from 0 to 20?,

Growth calculation NaN with 0 value

For reference, the dput(df) is

dput(df)
structure(list(ID = c("A", "B", "F", "D", "M", "P"), Jan_Score = c(0L, 
19L, 13L, 12L, 27L, 54L), Dec_Score = c(5L, 14L, 21L, 10L, 33L, 
54L), Cluster = structure(c(1L, 2L, 3L, 2L, 4L, 4L), .Label = c("1", 
"2", "3", "4"), class = "factor")), row.names = c(NA, -6L), class = "data.frame")```

Jon Spring · Answer 1 · 2022-04-21T23:36:19.433

Perhaps:

df$growth = pmax(0, df$Dec_Score / pmax(0.1, df$Jan_Score) - 1))

Starting from the inside, this will replace any Jan_Score < 0.1 with 0.1, and then will calculate the growth rate. If that rate is less than 0, it will replace with 0. I'm not sure what arbitrary adjustments you want to make to assume a "good offset" -- you're in a better position to bring that sort of domain understanding.

As for looking at clusters, it depends what you're trying to see. One approach, if you want to capture reliable observations of growth, could be to filter out rows with erroneous data, and then average the remaining Jan & Dec scores per cluster. E.g.

library(dplyr)
df %>%
  filter(pmin(Jan_Score, Dec_Score) > 0, Dec_Score >= Jan_Score) %>%
  group_by(Cluster) %>%
  summarize(across(Jan_Score:Dec_Score, mean)) %>%
  mutate(growth = Dec_Score / Jan_Score - 1)

Growth rate in student abilities

Partial solution

1 Answers1