Is a pipeline with a changing data type architecturally sound?

Question

I'm working on the architecture for what is essentially a document parsing and analysis framework. Given the lines of the document, the framework will ultimately produce a large object (call it Document) representing the document.

Early filters in the pipeline will need to operate on a line-by-line basis. However, filters further down will need to transform (and ultimately produce) the Document object.

To implement this, I was thinking of using a filter definition like this:

public interface IFilter<in TIn, out TOut> {
    TOut Execute(TIn data);
}

All filters will be registered with a PipelineManager class (as opposed to using the 'linked-list' style approach.) Before executing, PipelineManager will verify the integrity of the pipeline to ensure that no filter is given the wrong input type.

My question: Is it architecturally sound to have a pipeline with a changing data type (i.e. a good idea)?

P.S. The reason I'm implementing my application as a pipeline is because I feel it will be easy for plugin authors to replace/extend existing filters. Just swap out the filter you want to change with a different implementation, and you're set.

JerKimball · Accepted Answer · 2012-12-08T20:09:01.527

EDIT: Note, have removed other answer to replace with this wall'o'text grin

NINJAEDIT: Fun fact: Powershell (mentioned in @Loudenvier's answer) was once going to be named 'Monad' - also, found Wes Dyer's blog post on topic: The Marvels of Monads

One veryveryvery simplistic way of looking at this whole "Monad" thing is to think of it as a box with a very basic interface:

Return
Bind
Zero (optional)

The uses are similarly simple in concept - let's say you have a "thing":

You can wrap your "thing" in the box (this would be the "return") and have a "BoxOfThing"
You can give instructions on how to take the thing out of this box and put it into another box (Bind)
You can get an empty box (the "Zero": think of it as a sort of "no-op", like multiplying by one or adding zero)
(there are other rules, but these three are the most interesting)

The Bind bit is the really interesting part, and also the part that makes most people's heads explode; basically, you're giving a specification of sorts for how to chain boxes together: Let's take a fairly simple Monad, the "Option" or "Maybe" - a bit like Nullable<T>, but way cooler.

So everybody hates checking for null everywhere, but we're forced to due to the way reference types work; what we'd love is to be able to code something like this:

var zipcodesNearby = order.Customer.Address.City.ZipCodes;

And either get back a valid answer if (customer is valid + address is valid + ...), or "Nothing" if any bit of that logic fails...but no, we need to:

List<string> zipcodesNearBy = new List<string>();
if(goodOrder.Customer != null)
{
    if(goodOrder.Customer.Address != null)
    {
        if(goodOrder.Customer.Address.City != null)
        {
            if(goodOrder.Customer.Address.City.ZipCodes != null)
            {
                zipcodesNearBy = goodOrder.Customer.Address.City.ZipCodes;
            }
            else { /* do something else? throw? */ }
        }
        else { /* do something else? throw? */ }
    }
    else { /* do something else? throw? */ }
}
else { /* do something else? throw? */ }

(note: you can also rely on null coalescing, when applicable - although it's pretty nasty looking)

List<string> nullCoalescingZips = 
    ((((goodOrder ?? new Order())
        .Customer ?? new Person())
            .Address ?? new Address())
                .City ?? new City())
                    .ZipCodes ?? new List<string>();

The Maybe monad "rules" might look a bit like:

(note:C# is NOT ideal for this type of Type-mangling, so it gets a bit wonky)

public static Maybe<T> Return(T value)
{
    return ReferenceEquals(value, null) ? Maybe<T>.Nothing : new Maybe<T>() { Value = value };
}
public static Maybe<U> Bind<U>(Maybe<T> me, Func<T, Maybe<U>> map)
{
    return me != Maybe<T>.Nothing ?
        // extract, map, and rebox
        map(me.Value) :
        // We have nothing, so we pass along nothing...
        Maybe<U>.Nothing;
}

But this leads to some NASTY code:

var result1 = 
    Maybe<string>.Bind(Maybe<string>.Return("hello"), hello =>
        Maybe<string>.Bind(Maybe<string>.Return((string)null), doh =>
            Maybe<string>.Bind(Maybe<string>.Return("world"), world =>
                hello + doh + world).Value
            ).Value
        );

Luckily, there's a neat shortcut: SelectMany is very roughly equivalent to "Bind":

If we implement SelectMany for our Maybe<T>...

public class Maybe<T>
{
    public static readonly Maybe<T> Nothing = new Maybe<T>();
    private Maybe() {}
    public T Value { get; private set;}
    public Maybe(T value) { Value = value; }
}
public static class MaybeExt
{
    public static bool IsNothing<T>(this Maybe<T> me)
    {
        return me == Maybe<T>.Nothing;
    }
    public static Maybe<T> May<T>(this T value)
    {
        return ReferenceEquals(value, null) ? Maybe<T>.Nothing : new Maybe<T>(value);
    }
    // Note: this is basically just "Bind"
    public static Maybe<U> SelectMany<T,U>(this Maybe<T> me, Func<T, Maybe<U>> map)
    {
        return me != Maybe<T>.Nothing ?
            // extract, map, and rebox
            map(me.Value) :
            // We have nothing, so we pass along nothing...
            Maybe<U>.Nothing;
    }
    // This overload is the one that "turns on" query comprehension syntax...
    public static Maybe<V> SelectMany<T,U,V>(this Maybe<T> me, Func<T, Maybe<U>> map, Func<T,U,V> selector)
    {
        return me.SelectMany(x => map(x).SelectMany(y => selector(x,y).May()));
    }
}

Now we can piggyback on LINQ comprehension syntax!

var result1 = 
    from hello in "Hello".May()
    from oops in ((string)null).May()
    from world in "world".May()
    select hello + oops + world;
// prints "Was Nothing!"
Console.WriteLine(result1.IsNothing() ? "Was Nothing!" : result1.Value);

var result2 = 
    from hello in "Hello".May()
    from space in " ".May()
    from world in "world".May()
    select hello + space + world;
// prints "Hello world"
Console.WriteLine(result2.IsNothing() ? "Was Nothing!" : result2.Value);

var goodOrder = new Order { Customer = new Person { Address = new Address { City = new City { ZipCodes = new List<string>{"90210"}}}}};
var badOrder = new Order { Customer = new Person { Address = null }};

var zipcodesNearby = 
    from ord in goodOrder.May()
    from cust in ord.Customer.May()     
    from add in cust.Address.May()
    from city in add.City.May()
    from zip in city.ZipCodes.May()
    select zip;
// prints "90210"
Console.WriteLine(zipcodesNearby.IsNothing() ? "Nothing!" : zipcodesNearby.Value.FirstOrDefault());

var badZipcodesNearby = 
    from ord in badOrder.May()
    from cust in ord.Customer.May()     
    from add in cust.Address.May()
    from city in add.City.May()
    from zip in city.ZipCodes.May()
    select zip;
// prints "Nothing!"
Console.WriteLine(badZipcodesNearby.IsNothing() ? "Nothing!" : badZipcodesNearby.Value.FirstOrDefault());

Hah, just realized I forgot to mention the whole point of this...so basically, once you've figured out what the equivalent for "bind" is at each stage of your pipeline, you can use the same type of pseudomonadic code to handle the wrapping, unwrapping, and processing of each of your type transformations.

Unfortunately, that assumption is not accurate. Early in the pipeline, filters will handle lines (either strings, or perhaps structs). — Chris Laplante, Dec 07 '12 at 22:17
Ah; in that case, I'd go with your original plan - now, if you're trying to come up with a robust plan of *composing* Filters in a meaningful way, that's a different matter altogether; chaining free-form type transformations together in a type-safe way is not necc. straightforward... — JerKimball, Dec 07 '12 at 22:22
Could you expand a little on 'meaningful' composition? That might be what I'm looking for. I'm ok with going with my original plan, but I agree, it's kind of free-form madness :). However, I'm not entirely sure how I can make it more "safe" without making it overly complex and restrictive. — Chris Laplante, Dec 07 '12 at 22:24
Absolutely - I'll probably be skewered by any haskell guys out there, but that's *kind of* what Monads/Applicative Functors are for: A way to chain many types of discrete steps together into a cohesive pipeline of state/type/etc transformations. I'll cobble together another answer (or edit my first) — JerKimball, Dec 07 '12 at 22:28

score 2 · Answer 2 · answered Dec 07 '12 at 20:35

2

This won't answer your question, but a great place to look for inspiration on pipelines in the .NET world is PowerShell. They've implemented the pipeline model in a very clever way, and the objects flowing the pipeline will change all the time.

I've had to produce a Database to PDF document creation pipeline in the past and did it as PowerShell commandlets. It was so extensible that years later it is still being actively used and developed, it only migrated from PowerShell 1 to 2 and now possibly to 3.

You can get great ideas here: http://blogs.technet.com/b/heyscriptingguy/

answered Dec 07 '12 at 20:35

Loudenvier

8,362
6
45
66

Thanks for this. That blog looks interesting – Chris Laplante Dec 07 '12 at 21:58
+1 for powershell - see my giant block of text answer for a fun bit of triva. :) – JerKimball Dec 08 '12 at 00:50

Is a pipeline with a changing data type architecturally sound?

2 Answers2