0

I am working on a simulation that is able to reproduce people's performance on a cognitive task. The task is to provide an estimate for the time some object is displayed on a screen.

What data I have is, the mean error of their responses, the standard deviation of their error, the skewness of the data and the percent error of their estimates.

The way I am simulating their performance is by randomly providing the simulator with a "time" value which corresponds to the true amount of time the object would remain on screen in an experiment.

How I am simulating their performance is by multiplying that true time value by a sample from a distribution that is made up of their mean error, and the standard deviation of that error. This effectively gives a reproduction of their "estimate".

Here is the code I currently have this works almost 100% to what I need, but there's a catch.

import random
import numpy
import csv


A = [2502,4376,6255] #the two pools of time (in miliseconds) duration an object will actually remain on the screen
B = [3753,6572,9374]


def time_and_number(pnum, dots, trials):

data = list(csv.reader(open('workingdurationavgdata.csv', 'rb'))) #gutted helper function that pulls the relevant data from a CSV but these values could be anything.
ratio_avg = float(data[pnum-1][dots-1]) #mean error
ratio_std = float(data[pnum-1][dots+3]) #standard deviation of error
ideal_ratio = float(data[pnum-1][dots+7]) #the partipant's 'true' percent error of their estimates gathered experimentally this is used as a comparison to see if the simulation is accurately reproducing performance

estlist = [] #list of generated 'estimates'
errorlist = [] #list of errors
for i in range(trials):  #This randomly chooses between which time pool (above) will be chosen to submit a random entry from it
    poolchoice = numpy.random.randint(1,2)
    if poolchoice == 1:
        pool = A
    elif poolchoice == 2:
        pool = B

    time = random.choice(pool) #gives the simulator a random time from the selected pool
    estimate = time * numpy.random.normal(ratio_avg, ratio_std) #'errors' the true value by multiplying it by a value from a distribution that was generated using mean and standard deviation 
    percent_error = (abs((estimate - time ))/time) * 100 #percent error of this estimate
    estlist.append(estimate) #creating a list of our estimates
    errorlist.append(percent_error) #creating a list of percent errors

estimateavg = sum(estlist)/float(len(estlist)) #average estimate
erroravg = sum(errorlist) / float(len(errorlist)) #average error
return erroravg/ideal_ratio #comparing our average error to the one found experimentally as close to 1 as possible is the goal

What this is doing is using a normal distribution to generate simulated estimates of a participant's performance based on their error.

The issue is that this normal distribution provided by numpy is too inflexible. The data we have will not quite fit, and as such we will expect a systematic overestimation of error.

What I need, is a comparable function to this, but where I am able to more flexibly provide parameters like skewness to get a better fit to the data.

Fundamentally I need a function or a way to make a function that can take in:

A mean, a standard deviation, and a skew value, and sample a value from that distribution to be multiplied by a time value. This simulates a person making an estimate. OR: a better theoretical distribution for doing this accurately but which will still rely on the mean and standard deviation as parameters.

Since you don't have access to the data, I can provide some sample numbers if you want to run this on your own to see what it's doing:

ratio_avg = 0.838986407552044
ratio_std = 0.226132603313837
ideal_ratio = 24.814422079321

I'd be happy to provide any more clarification if it's needed, thank you to anyone who considers helping.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Bonbon
  • 1
  • You recognize that there are billions of the distribution to have the same mean, stddev and skew? You only have three momentum of your distribution computed, and it is not enough info to recover distribution back – Severin Pappadeux Nov 05 '15 at 22:00
  • and what do you do with samples from normals when there is large negatives? – Severin Pappadeux Nov 05 '15 at 22:21
  • Yes, I know these moments aren't enough alone to specify a meaningful distribution, I was just wondering if people were familiar with better, but simple options than the normal I was using. It doesn't make sense physically, but I do accept negative estimates of time from the simulator, from testing I've found negative estimates to be very uncommon and the issue would be better explained with systematic weakness of the distribution than those small deviations. After I have a more representative distribution, I can see to what extent I'd like to deal with negative reports. – Bonbon Nov 05 '15 at 22:26
  • ok, let me write short answer, maybe it will be useful... – Severin Pappadeux Nov 05 '15 at 22:36
  • I would appreciate that a lot, thank you. – Bonbon Nov 05 '15 at 22:40

1 Answers1

1

Ok, lets make some requirements. We prefer our distribution to be:

  1. Parametric (so from our values we could guess parameters)
  2. In the range [0...infinity)
  3. Somewhat gaussian: single peak, 0 at 0, 0 at infinity
  4. But with skew

So, then just look at some distributions and see if they are good fit. I would start with log-normal

https://en.wikipedia.org/wiki/Log-normal_distribution

It is easy to check if it is ok. It has two parameters, so from your mean, stddev you could pick mu and sigma. Then you could check if skew value is good fit. If it is, there is good distribution to use. If no, look at another similar one

maxymoo
  • 35,286
  • 11
  • 92
  • 119
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Oh, I probably should have mentioned that I've gone through all options in the numpy random sampling and many in the scipy continuous distributions using the rvs method. But it could be that I've missed some things. – Bonbon Nov 05 '15 at 23:07
  • @Bonbon did you try approach I proposed? Is it working? – Severin Pappadeux Nov 06 '15 at 19:14