2

I have 1 min OHLC bars on some market data that I imported into Mongo DB.

Each document looks like this:

{
    "_id" : ObjectId("5ac3163f31a0632c7642ca1c"),
    "Date" : "08/06/2007",
    "Time" : "15:01",
    "Open" : 1310,
    "High" : 1310.25,
    "Low" : 1309.5,
    "Close" : 1310,
    "Up" : 209,
    "Down" : 165,
    "Volume" : 0
}

I want to build a function that allows me to quickly generate X-bar intervals from this data. i.e. generate output 5-min bars, 1 hr bars, daily bars, etc... I also want to be able to filter out a data range.

I've been playing around with Mongo's aggregation functions, but I'm getting overwhelmed how I should approach this AND how I should order the pipeline operations.

Do I first group by 'Date', then sort by 'Time', then group again by $first, $last, $max and $min?

Or do I first create a new field somehow combining 'Date' and 'Time' and then proceed to the grouping?

Although don't I need to first somehow convert the "Date" and "Time" fields from string to Date field so that Mongo knows how to sort and match properly? ...but then which order would I do that in?

I'm still a newbie to MongoDB, so any advice would be appreciated.

jersey bean
  • 3,321
  • 4
  • 28
  • 43
  • I'm looking for solutions to a similar problem, and found this page in the MongoDB docs to be informative https://docs.mongodb.com/ecosystem/use-cases/pre-aggregated-reports-mmapv1/ – Jon Church May 25 '18 at 13:49

3 Answers3

1

Ok, I've come up with a solution:

db.minbars.aggregate([
   {
      $project: 
      {
         dts: 
         {
            $dateFromString: 
            {
               dateString: 
               {
                  $concat: ['$Date', '$Time']
               }
            }
         },
         Open:1, 
         High:1, 
         Low:1, 
         Close:1
      }   
   },
   {
      $match: 
      {
         dts: 
         { 
            $gte: ISODate("2016-01-01T00:00:00.000Z"), 
            $lte: ISODate("2016-12-31T00:00:00.000Z")
         }
      }
   },
   {
      $sort: { dts : 1 }
   },
   {
      $group:
      {
         _id: 
         {
            year: {$year: "$dts"},
            month: {$month: "$dts"},
            day: {$dayOfMonth: "$dts"},               
            hour: {$hour: "$dts"},
            min: 
            {
                $add: 
                [
                   {$subtract:
                   [
                      {$minute: "$dts"},
                      {$mod: [{$minute: "$dts"}, 5]}
                   ]},
                   5   
                ]
            }   
         },
         Open: {$first: "$Open"},
         High: {$max: "$High"},
         Low: {$min: "$Low"},
         Close: {$last: "$Close"}
      }
   } 
], {allowDiskUse: true})

Here's an explanation for each pipeline stage:

  1. Project

Use 'dateFromString' from combine 'Date' and 'Time' into an ISODate object ('dts' - which stands for date timestamp). Preserve the other OHLC fields.

  1. Match

Filter out based on a date range

  1. Sort

Sort by the new ISODate object ('dts').

  1. Group

Group together all those documents with the same Year, Month, Day, Hour, and 5-minute minute interval. The minute interval uses the formula: minute = minuteIn - (minuteIn % i) + i, where i=minute interval. I'm adding 'i' so that minutes 00, 01, 02, 03, and 04 are aggregated to the next 05 minute interval (and not the preceding 00 minute interval). NOTE: if you want 1-hr, 4-hr, Daily bars, etc...then you need to adjust the _id section accordingly.

NOTE: I'm using {allowDiskUse: true} here because at one point I ran into the Memory constraints at the Sort stage.

Maybe someone can come up with a simpler way to do this?


UPDATE:

As I noted in 4) above, I mentioned I was adding "i" (minute interval) to the resulting minute. However, when I did this I ended up with a '60' minute interval showing up in the output. You should only have 0, 5, 10, 15, ...55 minute bars and there should NOT be a 60-minute bar. So this was NOT correct.

Also, if you compare against a trading platform (i.e. Thinkorswim) you can see that the standard practice is to use preceding 5-minute interval for the timestamp of the bar. For example, the 5-minute bar 9:25 represents the aggregation of these minute bars: 9:25, 9:26, 9:27, 9:28, 9:29.

jersey bean
  • 3,321
  • 4
  • 28
  • 43
  • does mongodb store the aggregated data on the disk? so that after reboot of db instances it do not have to calculate again, it can be helpful in case of historical trade data as it remains constant. – meghraj27 Feb 20 '21 at 05:04
0

I think, you can simply use PHP. To reduce the complexity of your code, do not create a new field that contains Date and Time.

    $outputs = array();
    $raw_datas = array();
    foreach($raw_datas as $data){
      $date = \DateTime::createFromFormat('D/M/Y H:i', $data["Date"]." ".$data["Time"]);
      $outputs['daily'][$date->format("D/M/Y")][] = $date; //or $date.id if you aim to use AJAX later
      $outputs['hourly'][$date->format("D/M/Y H")][] = $date; //or $date.id if you aim to use AJAX later
//      And so on...
//      ....
    }

    return $outputs;

Unfortunately, if you aim to generate this plot many times, you can add a field time (containing the timestamp) instead!

Bill Somen
  • 101
  • 1
  • 5
0

You need to group by both date and time (time interval to be precise). Take a look at https://docs.mongodb.com/ecosystem/use-cases/storing-log-data/#counting-requests-by-day-and-page for examples of the pipeline and StackOverflow itself - the similar questions were answered many times, e.g. Group result by 15 minutes time interval in MongoDb

If you need to quickly generate X-bar intervals and the dataset is large enough for aggregation to be noticeably slow you may need to pre-aggregate data. The pattern is well described in https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb and https://docs.mongodb.com/ecosystem/use-cases/pre-aggregated-reports-mmapv1/ (ignore pre-allocation part if you are using WiredTiger engine)

Alex Blex
  • 34,704
  • 7
  • 48
  • 75