1

I have these data

  Open   High   Low    Close
1 6709.0 6709.5 6703.5 6705.0
2 6705.0 6710.5 6705.0 6710.0
3 6710.5 6713.5 6710.5 6713.5
4 6713.5 6713.5 6711.5 6712.0
5 6712.5 6715.0 6712.5 6714.0
6 6714.0 6716.0 6713.5 6716.0

which are OHLC prices of 1 minute bars. I want to find OHLC of 5 minute bars. Open and Close are easy, but I don't know how to do High and Low.

I need to find the maximum/minimum values in a subset of the data. So first I need to find max(data[1:5,2]) then max(data[6:10,2]) and so on. And I need to put the values in a vector called High. And for Low prices min(data[1:5,3]).

I need something like a for loop with two variables.

digEmAll
  • 56,430
  • 9
  • 115
  • 140
rrr123
  • 13
  • 2
  • could you please add your trying code. –  Apr 08 '18 at 08:13
  • Here `for(n in x){for (m in y){ High[i] <- max(data[n:m,2]) i=i+1 }}` This would actually work because then I could get the data I wanted by subsetting. But my computer is too slow (the data.frame has 1.5 million rows). – rrr123 Apr 08 '18 at 08:23
  • Actually, your data presents 6 bars so I though to create Group of size 6. Do you need creating Group of 5 bars? – Seymour Apr 08 '18 at 08:25
  • Yes every row is 1 minute so I need group of 5. The data I put in my post is just the output of `head(data)`. – rrr123 Apr 08 '18 at 08:28
  • Well then you can use my code but the `nrow()` of your dataset must be exactly a multiple of 5. Do you confirm? – Seymour Apr 08 '18 at 08:29
  • 1
    Ok thank you very much. – rrr123 Apr 08 '18 at 08:31

2 Answers2

0

I do not know how big is your dataset. However, for loop would be slow in this case. Personally, I would rather go for an aggregate function, specifically, the one already implemented in Rbase.

The first thing we have to do is to create an index that will allow us to perform the function on each subgroup. Each subgroup contains OHLC of 5 minute bars.

Therefore, let's start by reading the data

read the table

df <- read.table(text = "Open   High    Low  Close
+ 1 6709.0 6709.5 6703.5 6705.0
+ 
+ 2 6705.0 6710.5 6705.0 6710.0
+ 
+ 3 6710.5 6713.5 6710.5 6713.5
+ 
+ 4 6713.5 6713.5 6711.5 6712.0
+ 
+ 5 6712.5 6715.0 6712.5 6714.0")

Increase the size of df to better simulate the real use case scenario.

df <- rbind(df,df,df,df)

Create an index, first using vector recycling and the cumsum() function

df$group <- c(1,0,0,0,0)
df$group <- cumsum(df$group)

Now, finally we can obtain the Min and Max value for each 5 OHLC 5 minute bars and insert them into your final vectors, High and Low

High <- aggregate(High ~ group, data = df, FUN = max)$High
Low <- aggregate(Low ~ group, data = df, FUN = min)$Low

A faster alternative to aggregate is given by data.table package

require(data.table)
dt <- data.table(df)
High <- dt[ , max(High), by = group]$V1
Low <- dt[ , min(Low), by = group]$V1
Seymour
  • 3,104
  • 2
  • 22
  • 46
0

A dplyr based solution which can even work when rows are not multiple of 5 be achieved using ceiling to create group.

library(dplyr)

# Prepare large data.frame
df_mod <- bind_rows(df,df,df,df,df)

df_mod %>% 
  group_by(grp = ceiling(row_number()/5)) %>%
  summarise(High = max(High), Low = min(Low)) %>%
  select(-grp) %>%
  as.data.frame()

#   grp High    Low
# 1   1 6715 6703.5
# 2   2 6716 6703.5
# 3   3 6716 6703.5
# 4   4 6716 6703.5
# 5   5 6716 6703.5
# 6   6 6716 6705.0

Data

#As provided an example by OP
df <- read.table( text = 
"Open   High   Low    Close
1 6709.0 6709.5 6703.5 6705.0
2 6705.0 6710.5 6705.0 6710.0
3 6710.5 6713.5 6710.5 6713.5
4 6713.5 6713.5 6711.5 6712.0
5 6712.5 6715.0 6712.5 6714.0
6 6714.0 6716.0 6713.5 6716.0",
header = TRUE)
MKR
  • 19,739
  • 4
  • 23
  • 33