I changed a little bit the OP's original dataset for pedagogical/instructional reasons below:
df <- data.frame(
X1=c("GBP/USD"),
X2=c("2017-06-01 00:00:00", "2017-06-01 00:00:00", "2017-06-01 00:00:01", "2017-06-01 00:00:01", "2017-06-01 00:00:01", "2017-06-01 00:00:02", "2017-06-30 20:59:52", "2017-06-30 20:59:54", "2017-06-30 20:59:54", "2017-06-30 20:59:56", "2017-06-30 20:59:56", "2017-06-30 20:59:56"),
X3=c(1.28756, 1.28754, 1.28754, 1.28753, 1.28752, 1.28757, 1.30093, 1.30121, 1.30100, 1.30146, 1.30145,1.30145),
X4=c(1.28763, 1.28760, 1.28759, 1.28758, 1.28755, 1.28760,1.30300, 1.30300, 1.30390, 1.30452, 1.30447, 1.30447),
stringsAsFactors=FALSE)
df
X1 X2 X3 X4
1 GBP/USD 2017-06-01 00:00:00 1.28756 1.28763
2 GBP/USD 2017-06-01 00:00:00 1.28754 1.28760
3 GBP/USD 2017-06-01 00:00:01 1.28754 1.28759
4 GBP/USD 2017-06-01 00:00:01 1.28753 1.28758
5 GBP/USD 2017-06-01 00:00:01 1.28752 1.28755
6 GBP/USD 2017-06-01 00:00:02 1.28757 1.28760
7 GBP/USD 2017-06-30 20:59:52 1.30093 1.30300
8 GBP/USD 2017-06-30 20:59:54 1.30121 1.30300
9 GBP/USD 2017-06-30 20:59:54 1.30100 1.30390
10 GBP/USD 2017-06-30 20:59:56 1.30146 1.30452
11 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447
12 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447
Now, in low frequency data, there will be groupings of the same things. So, we must find the indices corresponding to unique startings, and the endings of the groups:
indices <- seq_along(df[,2])[!(duplicated(df[,2]))] # 1 3 6 7 8 10; the beginnings of groups (observations)
indices - 1 # 0 2 5 6 7 9; for finding the endings of groups
numberoflowfreq <- length(indices) # 6: number of groupings (obs.) for Low Freq data
Understand the pattern by writing openly:
mean(df[1:((indices -1)[2]),3]) # from 1 to 2
mean(df[indices[2]:((indices -1)[3]),3]) # from 3 to 5
mean(df[indices[3]:((indices -1)[4]),3]) # from 6 to 6
mean(df[indices[4]:((indices -1)[5]),3]) # from 7 to 7
mean(df[indices[5]:((indices -1)[6]),3]) # from 8 to 9
mean(df[indices[6]:nrow(df),3]) # from 10 to 12
Simplify the pattern:
mean3rdColumn_1st <- mean(df[1:((indices -1)[2]),3]) # from 1 to 2
mean3rdColumn_Between <- sapply(2:(numberoflowfreq-1), function(i) mean(df[indices[i]:((indices -1)[i+1]),3]) )
mean3rdColumn_Last <- mean(df[indices[6]:nrow(df),3]) # from 10 to 12
# 3rd column in low frequency data:
c(mean3rdColumn_1st, mean3rdColumn_Between, mean3rdColumn_Last)
Similarly for the 4th column:
mean4thColumn_1st <- mean(df[1:((indices -1)[2]),4]) # from 1 to 2
mean4thColumn_Between <- sapply(2:(numberoflowfreq-1), function(i) mean(df[indices[i]:((indices -1)[i+1]),4]) )
mean4thColumn_Last <- mean(df[indices[6]:nrow(df),4]) # from 10 to 12
# 4th column in low frequency data:
c(mean4thColumn_1st, mean4thColumn_Between, mean4thColumn_Last)
Collect all effort:
LowFrqData <- data.frame(X1=c("GBP/USD"), X2=df[indices,2], X3=c(mean3rdColumn_1st, mean3rdColumn_Between, mean3rdColumn_Last), x4=c(mean4thColumn_1st, mean4thColumn_Between, mean4thColumn_Last), stringsAsFactors=FALSE)
LowFrqData
X1 X2 X3 x4
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487
Now, the column X2
has unique minute values, X3
and X4
were formed by the means of relevant cells.
Also note that: There may not be values for all minutes in a range. One may pump NA
s for such cases. On the other hand, one may neglect the effect of the irregularity in such cases since the spacing of the observations would/may be the same for many observations, and therefore not so highly irregular. Also consider the fact that transforming the data into equally spaced observations using linear interpolation can introduce a number of significant and hard to quantify biases (See: Scholes and Williams).
M. Scholes and J. Williams, “Estimating betas from nonsynchronous data”, Journal of Financial Economics 5: 309–327, 1977.
Now, the regular 5-minute series part:
as.numeric(as.POSIXct("1970-01-01 03:00:00")) # 0; starting point for ZERO seconds. "1970-01-01 03:01:00" equals 60.
as.numeric(as.POSIXct("2017-06-01 00:00:00")) # 1496264400
# Passed seconds after the first observation in the dataset
PassedSecs <- as.numeric(as.POSIXct(LowFrqData$X2)) - 1496264400
LowFrq5minuteRaw <- cbind(LowFrqData, PassedSecs, stringsAsFactors=FALSE)
LowFrq5minuteRaw
X1 X2 X3 x4 PassedSecs
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615 0
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573 1
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600 2
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 2581192
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 2581194
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 2581196
5 minutes means 5*60=300 seconds. So, "having same Quotient in Division to 300" groups the observations in 5-minute intervals.
LowFrq5minuteRaw2 <- cbind(LowFrqData, PassedSecs, QbyDto300 = PassedSecs%/%300, stringsAsFactors=FALSE)
LowFrq5minuteRaw2
X1 X2 X3 x4 PassedSecs QbyDto300
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615 0 0
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573 1 0
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600 2 0
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 2581192 8603
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 2581194 8603
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 2581196 8603
indices2 <- seq_along(LowFrq5minuteRaw2[,6])[!(duplicated(LowFrq5minuteRaw2[,6]))] # 1 4; the beginnings of groups
LowFrq5minute <- data.frame(X1=c("GBP/USD"), X2=LowFrq5minuteRaw2[indices2,2], X3=aggregate(LowFrqData[,3] ~ QbyDto300, LowFrq5minuteRaw2, mean)[,2], X4=aggregate(LowFrqData[,4] ~ QbyDto300, LowFrq5minuteRaw2, mean)[,2])
LowFrq5minute
X1 X2 X3 X4
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287596
2 GBP/USD 2017-06-30 20:59:52 1.301163 1.303646
X2
is holding the timestamps of 1st occurences of the representatives of 5-minute obs lying on the intervals.