I am struggling a bit with id.vars
in melt()
and how to make it work with ggplot()
.
Let's say I got this data on California Population by race, age, and gender since 1970:
ca1970_1989<-read.table(
url('http://www.dof.ca.gov/research/demographic/data/race-ethnic/1970-89/documents/California.txt'),
header=F,strip.white=TRUE,stringsAsFactors=T)
names(ca1970_1989)<-c('County name','Year','Sex','Age','Total Population','White Population','Hispanic Population','Asian & Pacific Islander Population','Black Population','American Indian Population')
I don't need age for the time being so I sum that away.
ca1970_1989.agg<-aggregate(ca1970_1989[,6:10],by=list(ca1970_1989$Sex,ca1970_1989$Year),FUN=sum)
I want to plot it with ggplot()
so I melt as appropriate:
ca1970_1989.m<-melt(ca1970_1989.agg, id.vars=c('Group.1','Group.2'))
names(ca1970_1989.m)[1:2]<-c('Sex','Year')
> head(ca1970_1989.m)
Sex Year variable value
1 FEMALE 1970 White Population 7845344
2 MALE 1970 White Population 7635379
3 FEMALE 1971 White Population 7848106
4 MALE 1971 White Population 7626582
5 FEMALE 1972 White Population 7827480
6 MALE 1972 White Population 7597465
I want to pass to ggplot, but let it properly know that there is, in fact, an extra identifier (Sex) so it can distinguish male and female values.
If I do this call, I don't capture the Sex
grouping.
ggplot(ca1970_1989.m, aes(x=Year, y=value, group=variable), colour=variable)) +
geom_line()
Should I use cast
to have variable
be a combination of gender AND race? Should I use melt()
differently with respect to the id.vars
parameter in the first place?
Any help appreciated.