I have been working on the following csv file from http://www3.amherst.edu/~nhorton/r2/datasets/Batting.csv just for my own practice.
I am however not sure of how to do the following:
- Summarize the observations from the same team (teamID) in the same year by adding up the component values. That is, you should end up with only one record per team per year, and this record should have the
year, team name, total runs, total hits, total X2B ,…. Total HBP
.
Here is the code I have so far but it is only giving me only one team per year yet I need all the teams for each year with their totals (e.g, for 1980, I need all the teams with totalruns,totalhits,.....,for 1981, all the teams with totalruns,totalhits,.... and so on)
newdat1 <- read.csv("http://www3.amherst.edu/~nhorton/r2/datasets/Batting.csv")
id <- split(1:nrow(newdata1), newdata1$yearID)
a2 <- data.frame(yearID=sapply(id, function(i) newdata1$yearID[i[1]]),
teamID=sapply(id,function(i) newdata$teamID[i[1]]),
totalRuns=sapply(id, function(i) sum(newdata1$R[i],na.rm=TRUE)),
totalHits=sapply(id, function(i) sum(newdata1$H[i],na.rm=TRUE)),
totalX2B=sapply(id, function(i) sum(newdata1$X2B[i],na.rm=TRUE)),
totalX3B=sapply(id, function(i) sum(newdata1$X3B[i],na.rm=TRUE)),
totalHR=sapply(id, function(i) sum(newdata1$HR[i],na.rm=TRUE)),
totalBB=sapply(id, function(i) sum(newdata1$BB[i],na.rm=TRUE)),
totalSB=sapply(id, function(i) sum(newdata1$SB[i],na.rm=TRUE)),
totalGIDP=sapply(id, function(i) sum(newdata1$GIDP[i],na.rm=TRUE)),
totalIBB=sapply(id, function(i) sum(newdata1$IBB[i],na.rm=TRUE)),
totalHBP=sapply(id, function(i) sum(newdata1$HBP[i],na.rm=TRUE)))
a2