0

I am struggling a bit with id.vars in melt() and how to make it work with ggplot().

Let's say I got this data on California Population by race, age, and gender since 1970:

ca1970_1989<-read.table(
 url('http://www.dof.ca.gov/research/demographic/data/race-ethnic/1970-89/documents/California.txt'),  
header=F,strip.white=TRUE,stringsAsFactors=T)
names(ca1970_1989)<-c('County name','Year','Sex','Age','Total Population','White Population','Hispanic Population','Asian & Pacific Islander Population','Black Population','American Indian Population')

I don't need age for the time being so I sum that away.

ca1970_1989.agg<-aggregate(ca1970_1989[,6:10],by=list(ca1970_1989$Sex,ca1970_1989$Year),FUN=sum)

I want to plot it with ggplot() so I melt as appropriate:

ca1970_1989.m<-melt(ca1970_1989.agg, id.vars=c('Group.1','Group.2')) names(ca1970_1989.m)[1:2]<-c('Sex','Year')

> head(ca1970_1989.m)
     Sex Year         variable   value
1 FEMALE 1970 White Population 7845344
2   MALE 1970 White Population 7635379
3 FEMALE 1971 White Population 7848106
4   MALE 1971 White Population 7626582
5 FEMALE 1972 White Population 7827480
6   MALE 1972 White Population 7597465

I want to pass to ggplot, but let it properly know that there is, in fact, an extra identifier (Sex) so it can distinguish male and female values.

If I do this call, I don't capture the Sex grouping.

ggplot(ca1970_1989.m, aes(x=Year, y=value, group=variable), colour=variable)) +
geom_line()

Should I use cast to have variable be a combination of gender AND race? Should I use melt() differently with respect to the id.vars parameter in the first place?

Any help appreciated.

ako
  • 3,569
  • 4
  • 27
  • 38
  • I don't follow at all. You have a gender variable present in your data. Why wouldn't ggplot be able to use the variable `Sex` (if you told it to)? – joran Sep 29 '12 at 22:56
  • I am using this call, so at issue is that I use `variable` as the grouping level. Can I use both `variable` and `Sex` ? `ggplot(ca1970_1989.m, aes(x=Year, y=value, group=variable, colour=variable)) + geom_line()` – ako Sep 29 '12 at 23:24
  • 1
    Use `interaction` or just `paste` the variables together. – joran Sep 29 '12 at 23:29
  • @joran: is that the 'hack' way because I molted wrong/didn't cast or is that the most appropriate way to deal with multiple grouping levels? – ako Sep 29 '12 at 23:32
  • I suspect (but could be wrong) that whatever anyone comes up with using `cast` or `reshape` will feel more complicated or "hackish" than what I suggested. Using `paste` is just one line. How is that a hack? – joran Sep 29 '12 at 23:36
  • I just wasn't sure if I passed the right arguments to the melt() call upstream. Your solution definitely worked like a charm. You want to put paste and interaction as an answer and take the points? – ako Sep 29 '12 at 23:56

1 Answers1

1

You can merge the two factors "Sex" and "variable" together with a colon, like this:

ggplot(ca1970_1989.m, aes(x=Year, y=value, group=variable:Sex),color=variable) + geom_line()

This worked for me on several occasions. But I am rather new to R, so it might just as well be that this is considered to be bad style.

Michael
  • 305
  • 4
  • 11
  • That is quite succinct. Seems like it does the same thing as interaction() that @joran suggested for factors behind the scenes. Many ways to skin a chicken here :) – ako Sep 30 '12 at 20:00