IEnumerable: who disposes of what and when -- Did I get it right?

Question

Here is a hypotetical scenario.

I have very large number of user names (say 10,000,000,000,000,000,000,000. Yes, we are in the intergalactic age :)). Each user has its own database. I need to iterate through the list of users and execute some SQL against each of the databases and print the results.

Because I learned the goodness of functional programming and because I deal with such a vast number of users, I decide to implement this using F# and pure sequences, (aka IEnumerable). And here I go.

// gets the list of user names
let users() : seq<string> = ...

// maps user name to the SqlConnection
let mapUsersToConnections (users: seq<string>) : seq<SqlConnection> = ...

// executes some sql against the given connection and returns some result
let mapConnectionToResult (conn) : seq<string> = ...

// print the result
let print (result) : unit = ...

// and here is the main program
users()
|> mapUsersToConnections
|> Seq.map mapConnectionToResult
|> Seq.iter print

Beautiful? Elegant? Absolutely.

But! Who and at what point disposes of the SqlConnections?

And I don't thing the answer mapConnectionToResult should do it is right, because it knows nothing about the lifetime of the connection it is given. And things may work or not work depending on how mapUsersToConnections is implemented and various other factors.

As mapUsersToConnections is the only other place which has access to connection it must be its responsibility to dispose of SQL connection.

In F#, this can be done like this:

// implementation where we return the same connection for each user
let mapUsersToConnections (users) : seq<SqlConnection> = seq {
    use conn = new SqlConnection()
    for u in users do
        yield conn
}


// implementation where we return new connection for each user
let mapUsersToConnections (users) : seq<SqlConnection> = seq {
    for u in users do
        use conn = new SqlConnection()
        yield conn
}

The C# equivalient would be:

// C# -- same connection for all users
IEnumerable<SqlConnection> mapUsersToConnections(IEnumerable<string> users)
{
    using (var conn = new SqlConnection())
    foreach (var u in users)
    {
        yield return conn;
    }
}

// C# -- new connection for each users
IEnumerable<SqlConnection> mapUsersToConnections(IEnumerable<string> user)
{
    foreach (var u in users)
    using (var conn = new SqlConnection())
    {
        yield return conn;
    }
}

The tests I performed suggest that the objects do get disposed correctly at correct points, even if executing stuff in parallel: once at the end of the whole iteration for the shared connection; and after each iteration cycle for the non-shared connection.

So, THE QUESTION: Did I get this right?

EDIT:

Some answers kindly pointed out some errors in the code, and I made some corrections. The complete working example which compiles is below.
The use of SqlConnection is for example purposes only, it's any IDisposable really.

Example which compiles

open System

// Stand-in for SqlConnection
type SimpeDisposable() =
    member this.getResults() = "Hello"
    interface IDisposable with
        member this.Dispose() = printfn "Disposing"

// Alias SqlConnection to our dummy
type SqlConnection = SimpeDisposable

// gets the list of user names
let users() : seq<string> = seq {
    for i = 0 to 100 do yield i.ToString()
}

// maps user names to the SqlConnections
// this one uses one shared connection for each user
let mapUsersToConnections (users: seq<string>) : seq<SqlConnection> = seq {
    use c = new SimpeDisposable()
    for u in users do
        yield c
}

// maps user names to the SqlConnections
// this one uses new connection per each user
let mapUsersToConnections2 (users: seq<string>) : seq<SqlConnection> = seq {
    for u in users do
        use c = new SimpeDisposable()
        yield c
}

// executes some "sql" against the given connection and returns some result
let mapConnectionToResult (conn:SqlConnection) : string = conn.getResults()

// print the result
let print (result) : unit = printfn "%A" result

// and here is the main program - using shared connection
printfn "Using shared connection"
users()
|> mapUsersToConnections
|> Seq.map mapConnectionToResult
|> Seq.iter print


// and here is the main program - using individual connections
printfn "Using individual connection"
users()
|> mapUsersToConnections2
|> Seq.map mapConnectionToResult
|> Seq.iter print

The results are:

Shared connection: "hello" "hello" ... "Disposing"

Individual connections: "hello" "Disposing" "hello" "Disposing"

10 sextillion users. I think you've got some multi-accounters in your system. — jball, Jul 21 '11 at 16:07
Just a note: do forget the love of functional programming in this context: pure functional programming by definition is coding *without side effects*. IDisposable.Dispose() (and virtually *anything* that returns void) is only executed for its side effects and thus by definition is the *opposite* of pure-functional coding. — Dax Fohl, Jul 23 '11 at 00:43

Dax Fohl · Accepted Answer · 2011-07-21T22:04:45.333

I'd avoid this approach since the structure would fail if an unwitting user of your library did something like

users()
|> Seq.map userToCxn
|> Seq.toList() //oops disposes connections
|> List.map .... // uses disposed cxns 
. . ..

I'm not an expert on the issue, but I'd presume it's a best practice not to have sequences / IEnumerables muck with things after they yield them, for the reason that an intermediary ToList() call will produce different results than just acting on the sequence directly -- DoSomething(GetMyStuff()) will be different from DoSomething(GetMyStuff().ToList()).

Actually, why not just use sequence expressions for the whole thing, as that would get around this issue entirely:

seq{ for user in users do
     use cxn = userToCxn user
     yield cxnToResult cxn }

(Where userToCxn and cxnToResult are both simple one-to-one non-disposing functions). This seems more readable than anything and should yield the desired results, is parallelizable, and works for any disposable. This can be translated into C# LINQ using the following technique: http://solutionizing.net/2009/07/23/using-idisposables-with-linq/

from user in users
from cxn in UserToCxn(user).Use()
select CxnToResult(cxn)

Another take on this though would be to just define your "getSomethingForAUserAndDisposeTheResource" function first, and then use that as your fundamental building block:

let getUserResult selector user = 
    use cxn = userToCxn user
    selector cxn

Once you have this, then you can easily build up from there:

 //first create a selector
let addrSelector cxn = cxn.Address()
//then use it like this:
let user1Address1 = getUserResult addrSelector user1
//or more idiomatically:
let user1Address2 = user1 |> getUserResult addrSelector
//or just query dynamically!
let user1Address3 = user1 |> getUserResult (fun cxn -> cxn.Address())

//it can be used with Seq.map easily too.
let addresses1 = users |> Seq.map (getUserResult (fun cxn -> cxn.Address()))
let addresses2 = users |> Seq.map (getUserResult addrSelector)

//if you are tired of Seq.map everywhere, it's easy to create your own map function
let userCxnMap selector = Seq.map <| getUserResult selector
//use it like this:
let addresses3 = users |> userCxnMap (fun cxn -> cxn.Address())
let addresses4 = users |> userCxnMap addrSelector

That way you aren't committed to retrieving the entire sequence if all you want is one user. I guess the lesson learned here is make your core functions simple and that makes it easier to build abstractions on top of it. And note none of these options will fail if you do a ToList somewhere in the middle.

+1 For wrapping the entire thing in a sequence expression. That keeps it lazy *and* keeps resource lifetimes deterministic. — Daniel, Jul 21 '11 at 17:03
A very good point about Seq.toList(), thanks for pointing this out. It actually answers the original question "Did I get this right". Looks like I didn't. — Philip P., Jul 26 '11 at 13:47

score 4 · Answer 2 · answered Jul 21 '11 at 15:57

// C# -- new connection for each users
IEnumerable<SqlConnection> mapUserToConnection(string user)
{
    while (true)
    using (var conn = new SqlConnection())
    {
        yield return conn;
    }
}

This doesn't look right to me - you would dispose of a connection as soon as a new connection is requested by the next user (the next iteration cycle) - that means these connections can only be used exclusively one after the other - as soon as user B starts working with his connection, the connection of user A gets disposed. Is this really what you want?

score 4 · Answer 3 · answered Jul 21 '11 at 16:17

Your F# sample doesn't type-check (even if you add some dummy implementation to your functions, for example using failwith). I'll assume your userToConnection and connectionToResult functions actually take one user to one connection to one result. (Instead of working with sequences as in your example):

// gets the list of user names
let users() : seq<string> = failwith "!"

// maps user name to the SqlConnection
let userToConnection (user:string) : SqlConnection = failwith "!"

// executes some sql against the given connection and returns some result
let connectionToResult (conn:SqlConnection) : string = failwith "!"

// print the result
let print (result:string) : unit = ()

Now, if you want to keep handling of connection private to userToConnection, you can change it so that it doesn't return a connection SqlConnection. Instead, it can return a higher-order function that provides connection to some function (that will be specified in the next step) and after calling the function, disposes of the connection. Something like:

let userToConnection (user:string) (action:SqlConnection -> 'R) : 'R = 
  use conn = new SqlConnection("...")
  action conn

You can use currying so when you write userToConnection user, you'll get a function that expects a function and returns the result: (SqlConnection -> 'R) -> 'R. Then you can compose your overall function like this:

// and here is the main program
users()
|> Seq.map userToConnection
|> Seq.map (fun f -> 
     // We got a function that we can provide with our specific behavior
     // it runs it (giving it the connection) and then closes connection
     f connectionToResult)
|> Seq.iter print

I'm not quite sure if you want to map single user to single connection etc., but you could use exactly the same principle (with returning functions) even if you're working with collections of collections.

This looks like a very nice technique, the only problem is that you can't call that higher-order function more than once as the second call will be using the disposed object. And the problem is becoming worse if that function gets wrapped and propagated into something else etc until we get strange errors in a completely unrelated parts of the program. So not really an ideal solution. Unless I'm missing something? — Philip P., Jul 26 '11 at 14:06

Daniel · Answer 4 · 2011-07-21T16:30:11.420

I think there's huge room for improvement in this. It doesn't look like your code should compile since mapUserToConnection returns a sequence and mapConnectionToResult accepts a connection (changing your maps to collects would fix this).

I'm not clear on whether a user should map to multiple connections or if there's one connection per user. In the latter case, it seems overkill to return a singleton sequence for each user.

Generally, it's a bad idea to return an IDisposable from a sequence since you can't control when the item is disposed. A better approach is to limit the scope of the IDisposable to one function. This "controlling" function could accept a callback that uses the resource and, after the callback is fired, it could dispose of the resource (the using function is an example of this). In your case, combining mapUserToConnection and mapConnectionToResult may avoid the problem completely, since the function could control the connection's lifetime.

You would end up with something like this:

users
|> Seq.map mapUserToResult
|> Seq.iter print

where mapUserToResult is string -> string (accepting a user and returning a result, thereby controlling the lifetime of each connection).

Yes you are right, the functions should be returning SqlConnection and string, not seq and seq. Doh, what was I thinking. I guess the further samples don't quite work anymore... Will correct the post. Thanks! — Philip P., Jul 21 '11 at 16:43

score 1 · Answer 5 · answered Jul 21 '11 at 16:12

None of this looks quite right to me - for example why are you returning a sequence of connections for a single username? Did your signature not want to look like this (written as an extension method for Linq-ness):

IEnumerable<SqlConnection> mapUserToConnection(this IEnumerable<string> Usernames)

Anayway, moving on - in the first example:

using (var conn = new SqlConnection())
{
    while (true)
    {
        yield return conn;
    }
}

This will work, but only if the entire collection is enumerated. If (for example) only the first item is iterated through then the connection will not be disposed of (at least in C#) see Yield and usings - your Dispose may not be called!.

The second example seems to work fine for me, but I've had problems with code that did a similar thing and resulted in connections being disposed of when they shouldn't have.

Generally spealing I've found that combining dispose and yield return is a tricky business and I tend to avoid it in favour of implementing my own enumerator that explicitly implements both IDisposable and IEnumerable. That way you can be sure of when objects will be disposed of.

For the suggestion of implementing an `IEnumerable` for disposable resources. That's asking for trouble...and I can't think of a case where there isn't a better solution. I think the "correct" answer is to tell the OP this is a serious code smell. — Daniel, Jul 21 '11 at 17:00
@kragen: I've made an edit to correct the code. Please have a look. — Philip P., Jul 21 '11 at 17:26
@Daniel: But how about functional programming and use of map and friends? And sometimes there is no choice either. — Philip P., Jul 21 '11 at 17:28
@Komrade: I don't think you should ever return `IDisposable`s from a sequence. There are several viable alternatives mentioned in the answers to this question. — Daniel, Jul 21 '11 at 18:52

score 1 · Answer 6 · answered Jul 21 '11 at 17:09

Dispose should be called by whoever can guarantee the object is no longer being used. If you can make that guarantee (say the only time the object is used is within your method) then it is your job to dispose of it. If you can't guarantee that the object is done (say you are exposing an iterator with the objects) then it is your job to not worry about it, and let them deal with it.

On potential design decision you can follow is what the CLR does for Stream instances. Many constructors that except a Stream also accept a bool. If that bool is true, then the object knows it is in charge of disposing of the Stream once it is done with it. If you are returning an iterator you can return a tuple instead, of type Disposable,bool.

However I would look deeper at the actual problem you are facing. Maybe instead of worrying about things like this you need to change your architecture to avoid these issues. For instance instead of having a database per user have one database. Or maybe you need to use connection pooling to reduce the burden of live but inactive connections (I am not 100% about this last one do research about such options).

Good points. Though I disagree that using `bool` values to control who disposes of what like in `Stream` example is the demonstration of the best way to approach this. Yes this works, but to me this highlights the problem rather than solves it. Plus of course here we are talking about objects wrapping other objects, not pure independent functions. — Philip P., Jul 26 '11 at 13:56
@Komrade P: I don't think it is the best demonstration, but I also think that if disposal is necessary then you need to figure out from an architectural viewpoint who is responsible for disposing of the object, rather than attempting to leave that fact undefined. Another option would be to use a proxy that ignores the `Dispose` call in instances where the callee isn't responsible. — Guvante, Jul 26 '11 at 16:13

score 0 · Answer 7 · answered Jul 21 '11 at 18:12

Trying to solve this problem with only functional constructs is IMO a good example of a F# pitfall. Purely functional languages typically use immutable data structures. F# being based on .NET often doesn't which can occasionally be very good for things like performance.

My solution to this problem is to isolate the imperative bit of creating and destroying the SqlConnection object in its own function. In this case, we'll use useUserConnection for that:

let users() : seq<string> = // ...

/// Takes a function that uses a user's connection to the database
let useUserConnection connectionUser user =
    use conn = // ...
    connectionUser conn

let mapConnectionToResult conn = 
    // ... *conn is not disposed of here* 

// Function currying is used here
let mapUserToResult = useUserConnection mapConnectionToResult

let print result = // ...

// Main program 
users() 
    |> Seq.map mapUserToResult 
    |> Seq.iter print

score 0 · Answer 8 · answered Jul 22 '11 at 06:17

I believe there is a design issue here. If you look at the problem statement it is about getting some info about a user. User is represented as a string and the info is also represented as a string. So what we need is a function like:

let getUserInfo (u:string) : string = <some code here>

Usage of this is as simple as:

users() |> Seq.map getUserInfo

Now how this function gets the user info depends on this function, whether it uses SqlConnection, File stream or any other object which may be disposable or not, this function have the responsibility to create the connection and do proper resource handling. In your code you have completely separated the connection creation and fetching info parts which is causing this confusion about who dispose connection.

Now in case you want to use a single connection to be used by all getUserInfo method then you can make this method as

let getUserInfoFromConn (c:SqlConnection) (u:string) : string = <some code here>

Now this function takes a connection (or can take any other disposable object). In this case this function wont dispose off the connection object rather the caller of this function will dispose it off. We can use this like:

use conn = new SqlConnection()
users() |> Seq.map (conn |> getUserInfoFromConn)

All this makes it clear about who handles the resources.

Yes you are right, in this particular example your suggestion is probably the right thing to do. My question was about a more general use of `IDisposable` and in particular in the context of iterators and independent functions which produce/consume the iterators. — Philip P., Jul 26 '11 at 13:43

IEnumerable: who disposes of what and when -- Did I get it right?

8 Answers8