4

I have a list of events that occur in a system. My goal is to take the list of events and create a sliding window of the series to determine rate event occurrences. The events are loaded into the events list from an application outside of this scope of the issue.

Because the system can receive events from multiple sources at the same time, some of the event occurrence timestamps (the value I am using as a key for the series) are the same. What is the proper way to achieve this?

This is the error I get:

An unhandled exception of type 'System.ArgumentException' occurred in Deedle.dll

Additional information: Duplicate key '6/12/2015 3:14:43 AM'. Duplicate keys are not allowed in the index.

My code:

let mutable events = new ResizeArray<StreamEvent>()
let getSeries =
    let eventsKvp = events |>  Seq.map(fun(event) -> new KeyValuePair<DateTime,StreamEvent>(event.OccuredAt,event))
        let series = Series(eventsKvp)
    series |> Series.windowDist (TimeSpan(0, 0, 0,30))

Update #1

What isn't depicted here is some C# code which instantiates some of the F# Stream objects and adds events via the Stream.ProcessEvent method. That code is unimportant to the issue I am experiencing here.

I am no longer getting the duplicate key issue, but am getting a Additional information: Floating window aggregation and chunking is only supported on ordered indices. error.

Update #2 I needed to use sortByKey instead of sort.

Here is my F# code:

namespace Storck.Data
open System
open System.Collections.Generic
open Deedle

type EventType =
    | ClientConnected
    | ClientDisconnect

type Edge(id:string,streamId:string) = 
    member this.Id = id
    member this.StreamId = streamId
    member this.Edges =  new ResizeArray<Edge>() 

type StreamEvent(id:string,originStreamId:string,eventType:EventType,ocurredAt:DateTime) = 
    member this.Id = id
    member this.Origin = originStreamId
    member this.EventType = eventType
    member this.OccuredAt = ocurredAt
    override this.Equals(o) =
        match o with
        | :? StreamEvent as sc -> this.Id = sc.Id
        | _ -> false
    override this.GetHashCode() =
        id.GetHashCode()
    interface System.IComparable with
        member this.CompareTo(o) =
            match o with
            | :? StreamEvent as sc -> compare this.Id sc.Id
            | _ -> -1

type Client(id:string) = 
    member this.Id=id
type Key = 
  | Key of DateTime * string
  static member (-) (Key(a, _), Key(b, _)) = a - b
  override x.ToString() = let (Key(d, s)) = x in d.ToString() + ", " + s

  type Stream(id:string, origin:string) = 
    let mutable clients = new   ResizeArray<Client>()
    let mutable events = new ResizeArray<StreamEvent>()

    member this.Events =  clients.AsReadOnly()
    member this.Clients = clients.AsReadOnly()
    member this.Id = id
    member this.Origin = origin
    member this.Edges =  new ResizeArray<Edge>() 
    member this.ProcessEvent(client:Client,event:StreamEvent)  =  
        match event.EventType with
            |EventType.ClientConnected -> 
                events.Add(event)
                clients.Add(client)
                true
            |EventType.ClientDisconnect -> 
                events.Add(event)
                let clientToRemove = clients |> Seq.find(fun(f)-> f.Id = client.Id)
                clients.Remove(clientToRemove)
    member this.GetSeries() =       
        let ts = series [ for e in events -> Key(e.OccuredAt, e.Id) => e ]
        ts |> Series.sortByKey |> Series.windowDist (TimeSpan(0, 0, 0,30))
Elan Hasson
  • 1,214
  • 1
  • 19
  • 29

1 Answers1

5

One of the design decisions we made in Deedle is that a series can be treated as a continuous series (rather than a sequence of events) and so Deedle does not allow duplicate keys (which make sense for events but not for time series).

I wish there was a nicer support for things like your scenario - it is something we are thinking about for the next version, but I'm not sure how to best do this.

As Fyodor suggests in the comments, you can use unique index that consists of the date together with something (either source or just an ordinal index).

If you define the - operator on your key, then you can even use the windowDist function:

type StreamEvent = { OccuredAt : DateTime; Source : string; Value : int }

/// A key combines date with the source and defines the 
/// (-) operator which subtracts the dates returning TimeSpan
type Key = 
  | Key of DateTime * string
  static member (-) (Key(a, _), Key(b, _)) = a - b
  override x.ToString() = let (Key(d, s)) = x in d.ToString() + ", " + s

Now we can create a bunch of sample events:

let events = 
  [ { OccuredAt = DateTime(2015,1,1,12,0,0); Source = "one"; Value = 1 }
    { OccuredAt = DateTime(2015,1,1,12,0,0); Source = "two"; Value = 2 }
    { OccuredAt = DateTime(2015,1,1,13,0,0); Source = "one"; Value = 3 } ]

Here, I'll use built-in series function with the Deedle => operator to create series that maps the keys to values:

let ts = series [ for e in events -> Key(e.OccuredAt, e.Source) => e ]

And we can even use the windowDist function because the key type supports -!

ts |> Series.windowDist (TimeSpan(0, 0, 0,30))
Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • Thanks for the fast response. I'll give it a shot. I'm new to F# and the only way to learn is by asking for help sometimes :) – Elan Hasson Jun 12 '15 at 04:24
  • When I implement it your way, I send up with an exceptions: Additional information: Floating window aggregation and chunking is only supported on ordered indices. – Elan Hasson Jun 12 '15 at 04:48
  • How can I create an ordered index? – Elan Hasson Jun 12 '15 at 07:53
  • Try sorting the series first, using `Series.sort` (http://bluemountaincapital.github.io/Deedle/reference/deedle-seriesmodule.html#section8). Note that I intentionally put the `DateTime` as the first thing in my `Key` type - F# can automatically sort values of this type, first using the first element and then (if they are the same), using the second element. – Tomas Petricek Jun 12 '15 at 13:12
  • Oh, also, I think this error message could be improved to be more useful. Deedle is on GitHub (https://github.com/BlueMountainCapital/Deedle) and so feel free to find the message in the source code and send a pull request that improves it (e.g. add "Consider sorting the series before calling the operation.") That would help people who run into this in the future! – Tomas Petricek Jun 12 '15 at 13:14
  • Hrmm. Still getting the same issue. Posting my code above. – Elan Hasson Jun 13 '15 at 10:42
  • 1
    I'm not sure how to use your `Stream` type. Can you also post a simple code sample that I can just run which fails? – Tomas Petricek Jun 13 '15 at 17:14
  • I've posted it to https://github.com/ElanHasson/Storck Your help is very much appreciated. – Elan Hasson Jun 13 '15 at 22:36
  • 1
    Oops! You need to sort the series by the *key*, which is done using `Series.sortByKey` (the `sort` function sorts it by values, which is not what you need here). – Tomas Petricek Jun 14 '15 at 20:51
  • I can confirm that it is working :) Thank you. Now to figure out how to read the chunks :) – Elan Hasson Jun 14 '15 at 20:56
  • Tomas, I am not sure if my path is correct for what i'm trying to do: I want to look at the events that occurred over a 30 second window and count them to determine a "event velocity". A Stream has clients that connect to it. I want to determine popular streams by seeing how quickly clients are connecting. I want to determine a popularity rating based on connect events in 30 second window, 1 minute window, 5 minute window. My thoughts are to have three windows that run over a single series of events and produce a popularity factor based on them. Is this the right way to do this? – Elan Hasson Jun 16 '15 at 01:45
  • Tomas, has this changed in the years since? – Elan Hasson May 29 '20 at 14:13
  • 1
    @ElanHasson I believe this still remains the way it was. – Tomas Petricek Jun 01 '20 at 11:52