0

I am trying to find the most suited k-value (or discount rate) that best explains my participants' choices for immediate vs delayed reward (where lower k value means they choose a lot of immediate options and higher k value means they are more "patient".)

SS = Smaller Sooner; LL = Larger Later Reward; Delay = in Days; Choice = 0:SS, 1:LL; SV = Subjective Value.

So first I assign 5001 potential k values or discount rates to each trial (from -50 to 200 in steps of 0.05) which results in a data frame with 8001600 rows (50 participants * 32 trials per participant * 5001 potential values).

This is how the k-values were assigned to the data -

    uniquek<- c(seq(-50,200,0.05))
    DataSoc <- do.call(rbind,lapply(1:length(uniquek),function(i) data.frame(i,SocialData))) 
    k <- rep(uniquek, times = nrow(SocialData))
    DataSoc$k <- k

Then I create an empty data frame (called 'data_simulation' here) with 3 columns (PPN_f, k, r_squared) each 8001600 rows long.

Then I try to apply 'ddply' to a data frame to be able to perform a logistic regression using glm, something like this -

    data_simulation <- ddply(DataSoc,.(PPN_f,k), function(x){
    r_squared <- summary(glm(Choice ~ SV_diff, x, family=binomial()))$r_squared
    return(data.frame(r_squared))}, .progress ="win")

Ideally, this would give me the r_squared values for each trial, after which I would find the k with the largest r_squared value for each participant, and assign the corresponding k-value to that participant.

BUT the regression just isn't going through. Could you help solve this issue?

Here's the first 6 rows of my raw data for reference. Thank you for your help!

    > head(SocialData)
    PPN_f (Participant as factor)  SS   LL   Delay  Choice SS_SV LL_SV SV_diff
        <chr>                    <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>   <dbl>
      1 5e7339dac6b16528d49937bc  1000 30000    60      1  1000   1000      0
      2 5e7339dac6b16528d49937bc  1000  5000    60      0  1000   1000      0
      3 5e7339dac6b16528d49937bc  1000 10000    60      1  1000   1000      0
      4 5e7339dac6b16528d49937bc  1000  5000    30      0  1000   1000      0
      5 5e7339dac6b16528d49937bc  1000  5000     5      1  1000   1000      0
      6 5e7339dac6b16528d49937bc  1000  2500    14      0  1000   1000      0
  • Where is `k` in your data? – Duck Aug 16 '20 at 12:19
  • That is added later to the raw data, but before regression. Each row you see above in the raw data is replicated 5001 times, with k values such as -50, -49.95, -49.9, -49.85...199.5, 200 assigned to them. – Harshil G Vyas Aug 16 '20 at 14:59
  • Could you please try next code on your real data? Code: `data_simulation <- ddply(DataSoc,.(PPN_f,k), function(x){ y <- glm(Choice ~ SV_diff, x, family=binomial()) r_squared <- with(summary(y), 1 - deviance/null.deviance) return(data.frame(r_squared))}, .progress ="win")` – Duck Aug 16 '20 at 16:03
  • Thanks! It went through, but gives me 50+ warnings of "glm.fit: fitted probabilities numerically 0 or 1 occurred 0" In the 'data_simulation' dataframe, I get r_squared values until row #245049 but then the rest are all -Inf. Any suggestions on how this glm function can be optimized further? – Harshil G Vyas Aug 16 '20 at 19:28
  • It depends on the model you choose. I would try other kind of model otherwise the actual model you have is not working well! – Duck Aug 16 '20 at 19:31

0 Answers0