0

I have searched for a while now and not really finding a clear enough answer to understand how to do it. I have chart data saved in MongoDB on a minute time-base (Unix timestamp). Now I understand that if I want to visualize the chart in hourly or daily based time I need to aggregate the data, but I am not fully understanding what needs to be done.

I want to create groups of 1 hour and 1 day, those need to contain the last record of its timeframe. and then does the aggregation need to be done once and it saves to DB? or every time I query it. and how do I setup the query in mongoose?

Here a example of my collection chart data

collection:[
{
 time: 1649083392,
 volume: '20384.28',
 open: 444.42,
 close: 444.56,
 high: 444.76,
 low: 444.36
},
{
 time: 1649083448,
 volume: '20384.28',
 open: 444.42,
 close: 444.56,
 high: 444.76,
 low: 444.36
},
{
 time: 1649083501,
 volume: '20384.28',
 open: 444.42,
 close: 444.56,
 high: 444.76,
 low: 444.36
}
]

lets say I need a chart per day timeframe. there are 1440 minutes in a day.

I first need to make a group per day to add up volume per day and then project the last item for each day. it should project the volume per day and the last item of the day for each day.

Hope someone can explain me a bit how this works. Thanks.

---- Update ---

Sorry. so I made a mistake.

[{
  "pairAddress": "0x58F876857a02D6762E0101bb5C46A8c1ED44Dc16",
  "chart": [
    {
      "time": 1648978488,
      "high": "442.93181339228767",
      "low": "440.89881857342505",
      "open": "440.89901371910537",
      "close": "442.9168809785855",
      "marketcap": "2505922284.61",
      "volume": "14264.118014884118",
      "_id": {
        "$oid": "62496a3b8741c95e7661a0c2"
      }
    },
    {
      "time": 1648978536,
      "high": "442.9603776582797",
      "low": "442.9122490168528",
      "open": "442.9292814855807",
      "close": "442.9478700257827",
      "marketcap": "2506097613.54",
      "volume": "19482.73456302384",
      "_id": {
        "$oid": "62496a778741c95e7661a971"
      }
    },
    {
      "time": 1648978608,
      "high": "442.9893218041529",
      "low": "442.941310936878",
      "open": "442.9481594715175",
      "close": "442.9893218041529",
      "marketcap": "2506332138.21",
      "volume": "16138.024513587941",
      "_id": {
        "$oid": "62496ab38741c95e7661b53a"
      }
    },
    {
      "time": 1648978668,
      "high": "443.5010551781398",
      "low": "442.9032561370158",
      "open": "442.9893789835573",
      "close": "443.5010551781398",
      "marketcap": "2509227408.46",
      "volume": "24664.532500429723",
      "_id": {
        "$oid": "62496aef8741c95e7661c000"
      }
    },
    {
      "time": 1648978728,
      "high": "443.5205214040826",
      "low": "443.4918353053875",
      "open": "443.50216033083433",
      "close": "443.5202071089899",
      "marketcap": "2509335765.70",
      "volume": "5548.645723580672",
      "_id": {
        "$oid": "62496b2b8741c95e7661c951"
      }
    },
    {
      "time": 1648978788,
      "high": "443.6375372213781",
      "low": "443.470378539243",
      "open": "443.50698654937736",
      "close": "443.5999403093497",
      "marketcap": "2509786877.88",
      "volume": "52212.176474500986",
      "_id": {
        "$oid": "62496b678741c95e7661d396"
      }
    },
    {
      "time": 1648978848,
      "high": "443.61143763713756",
      "low": "443.58718500668306",
      "open": "443.59987943714646",
      "close": "443.5872533304441",
      "marketcap": "2509715097.86",
      "volume": "14691.325842608467",
      "_id": {
        "$oid": "62496ba38741c95e7661e2d3"
      }
    }
  ]
}]

This is an actual example of my document, I need to aggregate over the chart array. and using the existing code I cant get it to work.

Mayga Fatmawati
  • 101
  • 2
  • 7
  • It would be better if you can provide expected result in a valid JSON by editing your own question. Ask a question like [this](https://stackoverflow.com/questions/71218971/perform-a-conditional-lookup-and-add-custom-data-if-from-collection-has-no-data) would be more clarify for us to solve your problem. Be sure that your testing data can come up with your expected result in certain logic. – YuTing Apr 08 '22 at 01:48
  • I added the collection as it is saved in my database and explained what I would like to have as output. Hope its clear now. – Mayga Fatmawati Apr 08 '22 at 02:13
  • Do you always have 1440 documents per day or do you may have gabs which need to be filled with `null` or similar? – Wernfried Domscheit Apr 08 '22 at 07:10
  • @WernfriedDomscheit There are gaps in some, there won't always we exactly 1440 records per day – Mayga Fatmawati Apr 08 '22 at 08:11
  • And do you need to fill the gaps with null or any default value? – Wernfried Domscheit Apr 08 '22 at 09:03
  • no, I only save a candle if there are transactions within that timeframe, if we get several minutes no transaction, there will be no chart data for that time. as I understand that should be no problem – Mayga Fatmawati Apr 08 '22 at 09:59

3 Answers3

2

I Know That this has already been answered but here is my take by flooring the timestamp to the interval and then grouping the data based on the timestampBoundary

db.data.aggregate([
    {
        $addFields: {
            // Your Group Interval In Seconds eg.
            // - 86400 For Per Day
            // - 3600 Per Hour
            // - 900 Per 15 minute
            timestampBoundary: {
                $subtract: ["$time", {$mod: ["$time", 3600]}]
            },
        }
    },
    {
        $sort: {
            time: -1
        }
    },
    {
        $group: {
            _id: "$timestampBoundary",
            lastItem: {$first: "$$ROOT"},
            totalVolume: {
                $sum: {
                    $toDecimal: "$volume"
                }
            },
        }
    }
])
  • I not fully understand what this is doing. there are cases where timestapms not created every minute – Mayga Fatmawati Apr 08 '22 at 08:17
  • The First Pipeline Object `timestampBoundary: {$subtract: ["$time", {$mod: ["$time", 3600]}]},` adds a value to the object that can be used to group the documents per day, per hour or per 15 minutes, then the second pipeline object sorts all the documents descending and the last dose the group where it sums all the values in that interval(day, hour or 15 minutes) and returns the last object in the interval, to adjust the interval adjust the **3600** between 86400, 3600, 900 – Calvin Coomer Apr 08 '22 at 08:50
  • I just found that I actually need to aggregate over an array inside a document. not sure how that is done. – Mayga Fatmawati Apr 08 '22 at 09:57
  • There is an explode feature in mongo, if you provide some sample data I’m sure it would be possible – Calvin Coomer Apr 08 '22 at 11:25
  • Here the playground https://mongoplayground.net/p/qbAKuLI3jNP – Mayga Fatmawati Apr 08 '22 at 12:43
  • I really need help to get this work on array inside document – Mayga Fatmawati Apr 09 '22 at 00:46
1
db.collection.aggregate([
  {
    $sort: { time: -1 }
  },
  {
    $set: {
      d: { $toDate: { $multiply: [ "$time", 1000 ] } },
      volume: { $toDecimal: "$volume" }
    }
  },
  {
    $facet: {
      day: [
        {
          $group: {
            _id: {
              $dateTrunc: {
                date: "$d",
                unit: "day"
              }
            },
            volume: { $sum: "$volume" },
            lastItem: { $first: "$$ROOT" }
          }
        }
      ],
      hour: [
        {
          $group: {
            _id: {
              year: { $year: "$d" },
              dayOfYear: { $dayOfYear: "$d" },
              hour: { $hour: "$d" }
            },
            volume: { $sum: "$volume" },
            lastItem: { $first: "$$ROOT" }
          }
        }
      ],
      15min: [
        {
          $group: {
            _id: {
              year: { $year: "$d" },
              dayOfYear: { $dayOfYear: "$d" },
              hour: { $hour: "$d" },
              interval: {
                $subtract: [
                  { $minute: "$d" },
                  { $mod: [ { $minute: "$d" }, 15 ] }
                ]
              }
            },
            volume: { $sum: "$volume" },
            lastItem: { $first: "$$ROOT" }
          }
        }
      ]
    }
  }
])

mongoplayground

YuTing
  • 6,555
  • 2
  • 6
  • 16
  • Hey, thank you. so, I tested with a bigger dataset and it returns only 1 item. I have multiple days in a chart and need the last item of the day with the total volume of that day – Mayga Fatmawati Apr 08 '22 at 02:53
  • Ok, I added a better version in the playground https://mongoplayground.net/p/OmqzWqGBKUT – Mayga Fatmawati Apr 08 '22 at 02:55
  • and do I need to aggregate the full collection or can we specify a document? Also, I not fully understand what's going on, later I would need a way to do the same only per hour and 15 minutes. – Mayga Fatmawati Apr 08 '22 at 03:01
  • Sorry, I just checked the timestamp first item and last item, as I see its 2 days – Mayga Fatmawati Apr 08 '22 at 03:06
  • @MaygaFatmawati I update my answer check it. I come up this answer by [this](https://stackoverflow.com/questions/26814427/group-result-by-15-minutes-time-interval-in-mongodb) – YuTing Apr 08 '22 at 03:17
  • Awesome, that indeed works. Thank you. One last question. how would I use this the best? save that to db or just run this every time I need it? I would only need it on a specific document and not on the whole collection – Mayga Fatmawati Apr 08 '22 at 03:28
  • @MaygaFatmawati Thanks, I think it would depend on how long does this query takes. If it takes only seconds then it's fine just run it every time you need it. But if it takes minutes then save that to DB would be better. On the other hand, if this query need to be call very frequently and your DB is too busy to deal with it then save that to DB would be better. – YuTing Apr 08 '22 at 03:35
  • its for crypto token charting app, there are about 200k tokens with some of them having 2m records. the data keeps coming in every minute I think I can split the aggregation into each timeframe and run that every time needed for that timeframe. would that be the proper way? – Mayga Fatmawati Apr 08 '22 at 05:13
  • @MaygaFatmawati Sounds ok. You can give it a try. – YuTing Apr 08 '22 at 05:36
  • Why do you mix `$group: { _id: { $dateTrunc: { date: "$d", unit: "day" } }` and `$group: { _id: { year: { $year: "$d" }, dayOfYear: { $dayOfYear: "$d" }, hour: { $hour: "$d" } }`? Use `$dateTrunc` for all, it is much simpler, esp. for `$dateTrunc: { date: "$d", unit: "minute", binSize: 15 }` – Wernfried Domscheit Apr 08 '22 at 07:15
  • Sorry, I made a mistake and updated my question – Mayga Fatmawati Apr 08 '22 at 10:04
0

SO, I still had a problem with this because the data in needed to aggregate is inside array of a document.

I found the $unwind function to solve this.

Chart.aggregate([
  {$match:{"pairAddress": "0x58F876857a02D6762E0101bb5C46A8c1ED44Dc16"}},
  {$unwind: "$chart"},
  {$addFields: {
       timestampBoundary: {$subtract: ["$chart.time",{$mod: ["$chart.time", 900]}]},
  }},
  {$sort: {"chart.time": -1}},
  {$group: {
      _id: "$timestampBoundary",
      lastItem: {$first: "$$ROOT"},
      volume: {$sum: {$toDecimal: "$chart.volume"}}
  }},
  {$sort:{ _id: 1}}
])

First I use $match to make sure I only get the document I want to aggregate, Unwind the chart array which I want to aggregate. then add a new field to convert UNIX timestamp to sort by timestamp. after that I can sort by timestamp and make a group of the specified time interval. add up the volume per group and make a last sort. in the $mod: 900 stands for seconds, this is where I can specify time interval of groups.

Thanks to @YuTing and @Calvin Coomer I got the perfect solution now.

Mayga Fatmawati
  • 101
  • 2
  • 7