-1

Need some help with a simple r command to do something that is done through pivots in excel. I have a table with 2 variables ID and date. I need a command that can create a dataframe which has the unique list of these IDs and the minimum date against the ID.

for example, say my data frame is transpose of (ID list is col 1 and Date2 are in col 2)

ID:    ID1 ID1 ID2 ID2 ID3
Date: Mar 01 Mar 02 Mar03 Mar 04 Mar 05

I need a data frame that gives

ID1  Mar01
ID2  Mar03
ID3  Mar05
David Robinson
  • 77,383
  • 16
  • 167
  • 187
Murali
  • 135
  • 1
  • 1
  • 8

2 Answers2

0

The key here is to use as.Date for the date datatype and use one of the formats given there see ?format.Date for format types. Lets create an example of input table: df_input<-data.frame(ID=sample(1:5), Date=as.Date(sample(c("2012-10-03", "2012-11-03", "2012-12-03"), size= 5,replace=T))) Now use sort to sort your input data frame. Like: sort(df_input$Date)

Have a great day! Also, if you need help importing the file try this :Unable to import data from a sample Excel file

Community
  • 1
  • 1
user2217564
  • 240
  • 2
  • 9
  • The OPs example has repeating IDs; and they want to select a subset of the data, not just sort it, I think. Why make a new example when there's one there already? – Frank Nov 10 '13 at 00:53
  • 1
    :) I missed that point! But can be fixed through `duplicated()`. I'll wait for OP's comments before posting further. – user2217564 Nov 10 '13 at 01:25
0

Here's my interpretation of this.

You're starting with a data.frame that looks like this. (I've added an extra out-of-order value to the data.frame so my answer will be different from yours).

mydf <- data.frame(ID = c("ID1", "ID1", "ID2", "ID2", "ID3", "ID3"),
                   Date = c("Mar 01", "Mar 02", "Mar 03", "Mar 04", "Mar 05", "Mar 04"))
mydf
#    ID   Date
# 1 ID1 Mar 01
# 2 ID1 Mar 02
# 3 ID2 Mar 03
# 4 ID2 Mar 04
# 5 ID3 Mar 05
# 6 ID3 Mar 04

First, create actual "date" objects out of your "Date" column. I've assumed your date format is "mon day", so I've used "%b %d" in strptime. Since there is no year, the current year is assumed.

Date2 <- strptime(mydf$Date, format="%b %d")  ## ASSUMES THE CURRENT YEAR
Date2
# [1] "2013-03-01" "2013-03-02" "2013-03-03" "2013-03-04" "2013-03-05" "2013-03-04"

Next, find a function that lets us order these dates by your "ID" variable. In base R, ave does that pretty conveniently.

ave(as.numeric(Date2), mydf$ID, FUN = order)
# [1] 1 2 1 2 2 1

Use those values to subset rows with the first (lowest) value for each ID (that is, where the result is equal to "1").

mydf[ave(as.numeric(Date2), mydf$ID, FUN = order) == 1, ]
#    ID   Date
# 1 ID1 Mar 01
# 3 ID2 Mar 03
# 6 ID3 Mar 04
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485