-2

I wasn't sure what to title this.

I have a dataset of people, years, and activities

df <- data.frame("id" = c("1", "1", "1", "2", "2","3"), "years" = rep(1971, 6),
                      "activity" = c("a","b","c","d","e","e"))
  id years activity
1  1  1971        a
2  1  1971        b
3  1  1971        c
4  2  1971        d
5  2  1971        e
6  3  1971        e

I want to combine the years and activities columns, but for each year, in the original years column, I want to generate +/- 3 years, while retaining association with the id

If I did this in 2 steps: For id "1" the original year is 1971, so +/-3 years for ID 1 would result in:

 id   all_years 
 1    1968
 1    1969
 1    1970
 1    1971
 1    1972
 1    1973
 1    1974

In step 2, I want to combine this all_years column with the activities column from the original df, keeping the ids. So id "1" has 3 activities (a,b,c) and 7 years (1968:1964), so id "1" would appear 10 times in the new combined column.

So ultimately, I would end up with something like this:

  id   year_and_activities 
  1    a
  1    b
  1    c
  1    1968
  1    1969
  1    1970
  1    1971
  1    1972
  1    1973
  1    1974
  2    d
  2    e
  2    1968
...
  2    1974
...
  3    e
...

As always, Thank you!

crock1255
  • 1,025
  • 2
  • 12
  • 23
  • Please clarify what you want to work with. Your example has `rep("1971"),6)` so where do those six instances go? Your output simply takes each element of "df$id" and sticks `seq(df$years-3,df$years+3)` next to it. Do you want six instances of that sequence, followed by similar blocks of output for every other input year value? Also, how about leaving "years" as numbers instead of strings? It'll keep things a lot easier. Oh, and what happens to the "activity" values? Do they track "id" or "year" ? – Carl Witthoft Apr 22 '12 at 21:48
  • I down-voted for what appears to be an incomprehensible question. – IRTFM Apr 22 '12 at 23:09
  • Sorry about the poor question wording and the lousy code. Hopefully this makes the question more comprehensible. – crock1255 Apr 23 '12 at 00:27
  • Uh-oh. The updated edits make it even less clear. You have to put in enough info that we can see what goes where. Now it looks like "activity" in your input becomes "id" in the output! – Carl Witthoft Apr 23 '12 at 00:41
  • Hopefully this clears things up. I realized that I included too many "activities" in the combined df for id 1. I also changed the activities to letters so they aren't confused for numbers – crock1255 Apr 23 '12 at 02:40

1 Answers1

1

I couldn't really follow your question, but given the initial data frame, you can get your final data frame using melt:

require(reshape2)

##To get your +/- 3
dd = data.frame(id=df$id, activity=df$activity,
   years=df$years- rep(-3:3, nrow(df)))

##Pretty much gives you what you want
df_melt = melt(dd, id=1)

##Remove the unnecessary column
df_melt = df_melt[,c(1,3)]
##Rename 
colnames(df_melt) = c("id","year_and_activities")

##Order the column
df_melt[with(df_melt, order(id, year_and_activities)),]

As an aside, I would suggest that having a column as a mixture of "characters" and "years" is probably a bad idea - but you may have a good reason.

csgillespie
  • 59,189
  • 14
  • 150
  • 185