R data: Averaging x values into a new vector only if y values are the same

Question

I'm relatively new to R and am having trouble processing my data into a more workable form. If I had a continuous x and y vector, some with with multiple x values for the same y value how would I go about writing a script which could automatically average those multiple x values and create a new data.set with the the average x values and y values of the same length. An example is included below.

X <- c(34.2, 35.3, 32.1, 33.0, 34.7, 34.2, 34.1, 34.0, 34.1)
Y <- c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2,  5.6)

That's not very clear. Please show and describe the desired result — Rich Scriven, Jan 29 '15 at 01:03
If you are using `data.table`, the option is `setDT(Df)[, mean(X) , Y]` — akrun, Jan 29 '15 at 04:33

score 1 · Accepted Answer · answered Jan 29 '15 at 01:07

1

I think this does what you want. The aggregate function will group y by x in this case and take the mean.

x<-c(34.2,35.3,32.1,33.0,34.7, 34.2, 34.1, 34.0, 34.1)
y<-c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2,  5.6 )
df<-data.frame(x=x,y=y)

df2<-aggregate(y~.,data=df,FUN=mean) 
df2

answered Jan 29 '15 at 01:07

Jason

1,559
1
9
14

Thank you for the help. That is what I wanted. There's always several ways to do things in R. – Trevor Eakes Jan 30 '15 at 22:03
You're welcome Trevor. That's what we are here for. If you have a sec, you might give out the check mark somewhere to close the ticket so to speak. – Jason Jan 31 '15 at 00:04

score 1 · Answer 2 · answered Jan 29 '15 at 01:14

1

I assume you want the average for each Y value

Try this:

X <- c(34.2, 35.3, 32.1, 33.0, 34.7, 34.2, 34.1, 34.0, 34.1)
Y <- c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2,  5.6)
xy <- cbind(X,Y)
xy<- as.data.frame(xy)
tapply( X = xy$X,INDEX = list(xy$Y),FUN = mean )

answered Jan 29 '15 at 01:14

Skiptoniam

91
1
7

That is exactly what I wanted. Thanks for the simple solution. – Trevor Eakes Jan 30 '15 at 22:02

score 0 · Answer 3 · answered Jan 29 '15 at 03:24

If I understand you correctly, you want a new dataset in which for every Y value, you have the average of the corresponding X values. Using the fact that an average of a vector of length 1 is just that value to handle singletons, this can be done easily with dplyr.

X <- c(34.2, 35.3, 32.1, 33.0, 34.7, 34.2, 34.1, 34.0, 34.1)
Y <- c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2,  5.6)
Df <- data.frame(X, Y)
> Df
     X    Y
1 34.2 90.1
2 35.3 90.1
3 32.1 72.5
4 33.0 63.1
5 34.7 45.1
6 34.2 22.2
7 34.1 22.2
8 34.0 22.2
9 34.1  5.6

Now:

library(dplyr)
Df2 <- Df %>% group_by(Y) %>% summarize(X = mean(X))
> Df2
Source: local data frame [6 x 2]

     Y     X
1  5.6 34.10
2 22.2 34.10
3 45.1 34.70
4 63.1 33.00
5 72.5 32.10
6 90.1 34.75

R data: Averaging x values into a new vector only if y values are the same

3 Answers3