How do you periodically poll a relatively static source, like a database, to create a reference stream in Microsoft StreamInsight?
Here is what I've tried. I'm representing a database of user metadata as a simple List<UserMetaData>
var referenceData = new List<UserMetaData>()
{
new UserMetaData() { UserId = 1, Name = "Fred Jones", Location = "Seattle" },
new UserMetaData() { UserId = 2, Name = "Bob Murphy", Location = "Portland" }
};
Here is the UserMetaData class
public class UserMetaData
{
public int UserId { get; set; }
public string Name { get; set; }
public string Location { get; set; }
public override string ToString()
{
return string.Format(
"Name: {0}, ID: {1}, Location: {2}",
this.Name,
this.UserId,
this.Location);
}
}
The rest of the remaining example code replaces the ellipsis in the standard StreamInsight embedded deployment setup.
using (var server = Server.Create("default"))
{
var app = server.CreateApplication("app");
// ...
}
First, I create a heartbeat like this:
var heartbeat = app.DefineObservable(
() => Observable.Interval(TimeSpan.FromSeconds(2)));
In a real application I might make this heartbeat interval five minutes instead of two seconds. Anyway, next I want the heatbeat to trigger a database lookup for new user metadata:
var newUserMeta = app.DefineObservable(
() => heartbeat.SelectMany(_ => referenceData))
.ToPointStreamable(
c => PointEvent.CreateInsert(DateTime.Now, c),
AdvanceTimeSettings.IncreasingStartTime);
The IQbservable.SelectMany extension should flatten the IEnumerable<UserMetaData>
that I expect out of referenceData. The _
parameter throws away the long that is emitted from heartbeat. Then ToPointStreamable
converts the IObservable<UserMetaData>
to an IQStreamable
of point events with a start time of now. (DateTime.Now probably isn't very StreamInsight-y)
Then I convert that to a signal, run that over a simple query, define a console sink and deploy it.
// Convert to signal
var metaDataSignal = refStream
.AlterEventDuration(e => TimeSpan.MaxValue)
.ClipEventDuration(refStream, (e1, e2) => e1.Name == e2.Name);
// Query
var result = from t in metaDataSignal
select t;
// Define & deploy sink.
var sink = app.DefineObserver(
() => Observer.Create<UserMetaData>(c => Console.WriteLine(c)));
sink.Deploy("sink");
My last step is to Bind
the sink. I will also wait for a couple of seconds to watch the output of my metadata polling heartbeat print to screen, then add a new UserMetaData
record to my database and wait to see if the changes are reflected.
using (var process = result.Bind(sink).Run("process"))
{
Thread.Sleep(4000);
referenceData.Add(new UserMetaData()
{
UserId = 3,
Name = "Voqk",
Location = "Houston"
});
Console.ReadLine();
}
The new UserMetaData record is never reflected in output
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
Name: Fred Jones, ID: 1, Location: Seattle
Name: Bob Murphy, ID: 2, Location: Portland
(... forever)
What I assume is happening is that my UserMetaData list is being serialized and re-created on the SI server so any changes made to my local copy aren't reflected. I'm not sure how to get past this.
Mark Simms wrote a blog posts about using reference streams in StreamInsight back in 2010 explaining how to use static data sources and said his next post would describe using SQL Server.
Unfortunately that post never happened.
EDIT: I've changed the classes in this post to mach those in Mark Simms' post and tried to de-clutter and elaborate on my process.