I am struggling with writing code to calculate and then plot the growth rate. My data frame df
looks like this
ID Jan_Score Dec_Score Cluster
A 0 5 1
B 19 14 2
F 13 21 3
D 12 10 2
M 27 33 4
P 54 54 4
- The scores column in the above data frame reflects students' scores in an abilities test. Therefore, zero (0) does not mean a lack of that ability, it just means that those students could not provide sufficient evidence to demonstrate that ability.
- The values in the scores column are, therefore, never negative.
- Also, when students' assessments show identical scores in January (at the start of the year) and December (at the end of the year), it does not mean that students haven't grown in their capabilities because the content on which assessment was done in January is lesser than the content in December. Therefore, the growth cannot be Infinity (such as for ID A when January score is zero) or negative (such as for ID D when January score is higher than the December score).
My question is, how can we calculate (and if possible plot) the growth per student ID and then per cluster?
Any help would be greatly appreciated.
Partial solution
I am using the following formula for calculating growth per person (i.e., per ID)
df$growth = (df$Dec_Score - df$Jan_Score) / df$Jan_Score
- But this formula needs to be changed to accommodate for the corner cases such as:
-
- When Jan_Score is zero, it replaces the Jan_Score value of 0 with 0.1 (that's an arbitrary decision that I took).
-
- Address the cases when Dec_score is less than the Jan_Score. Perhaps add an offset value in all Dec_Score so that they are always more than Jan_Score. If the maximum value is 54 and the minimum value is 0, what could be a good offset value to be added in Dec_Score?
Any help would be greatly appreciated!
The following posts are related but do not address my problem:
How to calculate growth with a positive and negative number?,
How to calculate percentage when old value is ZERO,
what is my increment percentage from 0 to 20?,
Growth calculation NaN with 0 value
For reference, the dput(df) is
dput(df)
structure(list(ID = c("A", "B", "F", "D", "M", "P"), Jan_Score = c(0L,
19L, 13L, 12L, 27L, 54L), Dec_Score = c(5L, 14L, 21L, 10L, 33L,
54L), Cluster = structure(c(1L, 2L, 3L, 2L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor")), row.names = c(NA, -6L), class = "data.frame")```