I am trying to sort out any multiple entries per day, by selecting the first registered entry each day, per subject ID.
I am handling a very big data set, so here only a snapshot of my data structure:
df <- c(Contact.ID, Date.Time, Age, Gender, Attendance)
Contact.ID Date.Time Age Gender Attendance
1 A 2012-07-06 18:54:48 37 Male 30
2 A 2012-07-06 20:50:18 37 Male 30
3 A 2012-08-14 20:18:44 37 Male 30
4 B 2012-03-15 16:58:15 27 Female 40
5 B 2012-04-18 10:57:02 27 Female 40
6 B 2012-04-18 17:31:22 27 Female 40
7 B 2012-04-18 18:37:00 27 Female 40
8 C 2013-10-22 17:46:07 40 Male 5
9 C 2013-10-27 11:21:00 40 Male 5
10 D 2012-07-28 14:48:33 20 Female 12
I have tried a few different things such as:
t.first <- df[match(unique(df$Date.Time), df$Date.Time),]
setDT(df)[,.SD[which.max(df$Date.Time)],keyby=df$Contact.ID]
library(dplyr)
t.first <- ddply(df, "Date.Time", function(z) tail(z,1))
But none of them get me the first entry given my specific subject ID.
So what I need to be left with at the end is a data set such that:
Contact.ID Date.Time Age Gender Attendance
1 A 2012-07-06 18:54:48 37 Male 29
2 A 2012-08-14 20:18:44 37 Male 29
3 B 2012-03-15 16:58:15 27 Female 38
4 B 2012-04-18 10:57:02 27 Female 38
5 C 2013-10-22 17:46:07 40 Male 5
6 C 2013-10-27 11:21:00 40 Male 5
7 D 2012-07-28 14:48:33 20 Female 12
Please, if anyone could help, I have been stuck on this for way too long.