I have a dataset containing the number of infants born per gestational week.
I am trying to determine the median gestational age of delivery based on the frequency of infants born for this particular year
For example:
GA | num_infants_born |
---|---|
20 weeks | 16 |
21 weeks | 22 |
22 weeks | 34 |
23 weeks | 45 |
24 weeks | 60 |
25 weeks | 67 |
26 weeks | 94 |
and onwards, until 41 weeks. The distribution is (not surprisingly) left skewed
I also calculated cumulative frequencies using
data$cumulative_freq = cumsum(data$num_infants_born)
Do I use the cumulative_freq column to calculate the median number of infants born that corresponds to a gestational week? Using
median(medianGA2001a$cumulative_freq)
gives me an unexpected number.
I am expecting the median GA to be around 35 weeks, based on the distribution