0

How do you periodically poll a relatively static source, like a database, to create a reference stream in Microsoft StreamInsight?

Here is what I've tried. I'm representing a database of user metadata as a simple List<UserMetaData>

var referenceData = new List<UserMetaData>()
    {
        new UserMetaData() { UserId = 1, Name = "Fred Jones", Location = "Seattle" },
        new UserMetaData() { UserId = 2, Name = "Bob Murphy", Location = "Portland" }
    };

Here is the UserMetaData class

public class UserMetaData
{
    public int UserId { get; set; }
    public string Name { get; set; }
    public string Location { get; set; }

    public override string ToString()
    {
        return string.Format(
            "Name: {0}, ID: {1}, Location: {2}",
            this.Name,
            this.UserId,
            this.Location);
    }
}

The rest of the remaining example code replaces the ellipsis in the standard StreamInsight embedded deployment setup.

using (var server = Server.Create("default"))
{
    var app = server.CreateApplication("app");
    // ...
}

First, I create a heartbeat like this:

var heartbeat = app.DefineObservable(
                        () => Observable.Interval(TimeSpan.FromSeconds(2)));

In a real application I might make this heartbeat interval five minutes instead of two seconds. Anyway, next I want the heatbeat to trigger a database lookup for new user metadata:

var newUserMeta = app.DefineObservable(
                        () => heartbeat.SelectMany(_ => referenceData))
                    .ToPointStreamable(
                        c => PointEvent.CreateInsert(DateTime.Now, c),
                        AdvanceTimeSettings.IncreasingStartTime);

The IQbservable.SelectMany extension should flatten the IEnumerable<UserMetaData> that I expect out of referenceData. The _ parameter throws away the long that is emitted from heartbeat. Then ToPointStreamable converts the IObservable<UserMetaData> to an IQStreamable of point events with a start time of now. (DateTime.Now probably isn't very StreamInsight-y)

Then I convert that to a signal, run that over a simple query, define a console sink and deploy it.

// Convert to signal
var metaDataSignal = refStream
                    .AlterEventDuration(e => TimeSpan.MaxValue)
                    .ClipEventDuration(refStream, (e1, e2) => e1.Name == e2.Name);

// Query
var result = from t in metaDataSignal
                 select t;

// Define & deploy sink.
var sink = app.DefineObserver(
                    () => Observer.Create<UserMetaData>(c => Console.WriteLine(c)));
sink.Deploy("sink");

My last step is to Bind the sink. I will also wait for a couple of seconds to watch the output of my metadata polling heartbeat print to screen, then add a new UserMetaData record to my database and wait to see if the changes are reflected.

using (var process = result.Bind(sink).Run("process"))
{
    Thread.Sleep(4000);

    referenceData.Add(new UserMetaData() 
                            {
                                UserId = 3, 
                                Name = "Voqk", 
                                Location = "Houston" 
                            });

    Console.ReadLine();
}

The new UserMetaData record is never reflected in output

Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland

(... forever)

What I assume is happening is that my UserMetaData list is being serialized and re-created on the SI server so any changes made to my local copy aren't reflected. I'm not sure how to get past this.

Mark Simms wrote a blog posts about using reference streams in StreamInsight back in 2010 explaining how to use static data sources and said his next post would describe using SQL Server.

Unfortunately that post never happened.

EDIT: I've changed the classes in this post to mach those in Mark Simms' post and tried to de-clutter and elaborate on my process.

voqk
  • 1
  • 2

2 Answers2

0

Your assumption is correct. .NET classes don't go into the StreamInsight engine; your class is used for schema only (the shape of the payload). So ... how do you deal with changing reference data? First, your source needs to refresh periodically. What that period is depends on how frequently you expect the data to change. Then, for the reference stream, you need to either use a timer to enqueue CTIs (to keep it moving) regardless of the data -or- you need a way to bring CTIs in from the data stream. The first method is easiest but the second method is more flexible as it ties the reference stream to whatever timestamps you are using in the data stream and would work in a replay scenario, not just a real-time scenario. Finally, you need to allow your reference events to expire and be replaced when new reference data is added. This is done using the "To Signal" pattern (Alter/Clip). Again, you have options here. If your reference source is "smart" enough to enqueue changes only, you can alter the lifetime of the reference event to TimeSpan.MaxValue and then reference data is valid until cancelled. If, however, you just want to reload all of the reference events, you can alter the event duration to be just a little longer than your refresh rate and then clip. This method also allows for removal of reference events from the stream (in the case of deletion, etc.) The final challenge with reference data is how to handle the timestamps. In most of the sample scenarios, the data time line is based on system clock ... this isn't always the case. And, even in those scenarios, you can "miss" some joins at startup due to a race condition from the reference events still being enqueued while the data events are already pumping through. In this case, it works pretty well to use an absurdly early start date (Jan 1, 1970) for the reference data and an absurdly late end date (Jan 1, 2100) and enqueue as an interval. However, in this case, you absolutely need to import CTIs from the data stream, modify the start dates of the reference events so they don't violate the imported CTIs and other synchronization tasks ... yourself. The adapter/query model handled this quite nicely but the Reactive model doesn't ... however, with the Reactive model, you can use subjects to really fine tune how all of this works so it winds up being significantly more flexible.

DevBiker
  • 451
  • 2
  • 4
  • "First, your source needs to refresh periodically." Can you explain how to refresh a deployed source? I received a _Microsoft.ComplexEventProcessing.InvalidDefinitionException_ with additional information "The object 'cep:/Server/Application/referenceTest/Entity/refStream' already exists" after trying to deploy a new source with the same name. – voqk Jan 11 '15 at 18:04
0

As a test I moved var referenceData = List<UserMetaData>()... out of main and declared it to be a static member instead of a local variable.

class Program
{
    // *NOW STATIC*
    private static List<UserMetaData> referenceData = new List<UserMetaData>()
    {
        new UserMetaData() {UserId = 1, Name = "Fred Jones", Location = "Seattle"},
        new UserMetaData() {UserId = 2, Name = "Bob Murphy", Location = "Portland"}
    };

    public static void Main(string[] args)
    {
        // ...

Now changes in the database are reflected in the output...

Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Voqk, ID: 3, Location: Houston
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Voqk, ID: 3, Location: Houston
voqk
  • 1
  • 2