Batch distribution fitting using Tidyverse and fitdistrplus

Question

I have a dataset that is as follows (10,000+ Rows):

P_ID	SNUM	RNUM	X
ID_233	10	2	40.31
ID_233	10	3	23.21
ID_234	12	5	11.00
ID_234	12	6	0.31
ID_234	13	1	0.00
ID_235	10	2	66.23

From this dataset, I want to fit each distinct P_ID to a Gamma distribution (ignoring the testing of how well the sampled data fits the distribution)

Using the fitdistrplus package, I can achieve this by extracting the X for an individual P_ID into a vector and then run it through fw <- fitdist(data,"gamma") and then extract the shape and rate descriptive variables out, but this is all very manual.

I would like to find a method using tidyverse to go from the data frame above to:

P_ID	Distrib	G_Shape	G_Rate
ID_233	Gamma	1.21557116	0.09206639
ID_234	Gamma	3.23234542	0.34566432
ID_235	Gamma	2.34555553	0.92344521

How would i achieve this with Tidyverse and Pipes and not doing a succession of for loops?

How do you extract `shape` and `rate` descriptive variables out of `fw` ? — Ronak Shah, Jan 15 '21 at 06:27

score 0 · Accepted Answer · answered Jan 15 '21 at 06:42

You could apply fitdist for every individual using group_by and extract shape and rate values out of each model.

library(dplyr)
library(purrr)
library(fitdistrplus)

data %>%
  group_by(P_ID) %>%
  summarise(model = list(fitdist(X, "gamma"))) %>%
  mutate(G_Shape = map_dbl(model, pluck, 'estimate', 'shape'),
         G_rate =  map_dbl(model, pluck, 'estimate', 'rate')) -> result

result

Batch distribution fitting using Tidyverse and fitdistrplus

1 Answers1