-1

I have a question regarding calculating means for each subj.

I have a dataframe as follows:

   subj entropy n_gambles trial response   rt
1     0    high         2     0   sample 4205
2     0    high         2     0   sample  676
3     0    high         2     0     skip    0
4     0    high         2     1   sample  883
5     0    high         2     1   sample  697
6     0    high         2     1     skip    0
7     0    high         2     2   sample 1493
8     0    high         2     2   sample  507
9     0    high         2     2     skip    0
10    0    high         2     3   sample 1016

and I want to work out the means of sampling for each subj.

I have worked it down to here but I don't know what code next.

Note: the proportion of sampling for each subj are different.

  subj trial n_gambles entropy response n_sample
2497    0     0         2    high   sample        2
2498    1     0         2    high   sample        0
2499    2     0         2    high   sample        0
2500    3     0         2    high   sample        0
2501    4     0         2    high   sample       27
2502    5     0         2    high   sample        0
2503    6     0         2    high   sample        0
2504    7     0         2    high   sample        0
2505    8     0         2    high   sample       19
2506    9     0         2    high   sample        0
2507   10     0         2    high   sample        0

Below are the codes I've for so far.

rm(list=ls())

# Import 'sub.csv' data file into a dataframe
data_subj <- read.csv ('subj.csv')
head (data_subj)

# Import 'response.csv' data file into a dataframe
data_response <- read.csv ('response.csv')
head(data_response)

# Merge 'response' and 'trial'
data <- merge (data_subj, data_response, by='subj')
head(data)


data <- as.data.frame(table(data$subj, data$trial, data$n_gambles, data$entropy, data$response))
colnames(data) <- c('subj', 'trial', 'n_gambles', 'entropy', 'response', 'n_sample')

# Subset for "sample"
data <- data[ data$response == "sample",]
head(data)

Could someone please help me out?

I'd expect the output to look something like this:

subj trial n_gambles entropy response n_sample  mean_sample/trials
  0     0         2    high   sample        2             
  1     0         2    high   sample        0
  2     0         2    high   sample        0
  3     0         2    high   sample        0
  • Can you please clarify what you mean by "the means of sampling". Mean of which variable(s)? Please help us to help by providing a small example of your expected output. – Henrik Sep 04 '13 at 21:19
  • Hi, Under column 'response', there's either sample, skip or buy. Each subj does 6 trials. They sample before they buy or skip. I would like to calculate the mean amount of sampling for each subj. Does that make sense? Let me know if it's still not clear. Thanks. :) – user2707619 Sep 04 '13 at 21:37
  • @user2707619: Please go back to the answers in your earlier question and replace skip with sample in the answers. – Metrics Sep 04 '13 at 21:44
  • @user2707619, in the text you write "calculating means for each subj", and in expected output one of the header is `mean_sample/trials`. Does this mean that you actually want to calculate proportion of outcome `"sample"` in the `response` variable, _within `subj` within each `trial`_? For example, for `subj` 0 and `trial` 0: 2 out of 3 `response` equals `"sample"`, and the expected outcome is then 2/3 for this particular subj*trial combination? Please try to be as clear as possible when you formulate your question. You are then much more likely to rapidly receive a correct answer. – Henrik Sep 05 '13 at 07:39

1 Answers1

0

This is similar to the answer of your earlier question:

library(plyr)
ddply(df,.(subj),summarize,mymean=(length(which(response=="sample")))/6)
 subj   mymean
1    0 1.166667
Community
  • 1
  • 1
Metrics
  • 15,172
  • 7
  • 54
  • 83