2

In the following code Seq.generateUnique is constrained to be of type ((Assembly -> seq<Assembly>) -> seq<Assembly> -> seq<Assembly>).

open System
open System.Collections.Generic
open System.Reflection

module Seq =
  let generateUnique =
    let known = HashSet()
    fun f initial ->
      let rec loop items = 
        seq {
          let cachedSeq = items |> Seq.filter known.Add |> Seq.cache
          if not (cachedSeq |> Seq.isEmpty) then
            yield! cachedSeq
            yield! loop (cachedSeq |> Seq.collect f)
        }
      loop initial

let discoverAssemblies() =
  AppDomain.CurrentDomain.GetAssemblies() :> seq<_>
  |> Seq.generateUnique (fun asm -> asm.GetReferencedAssemblies() |> Seq.map Assembly.Load)

let test() = printfn "%A" (discoverAssemblies() |> Seq.truncate 2 |> Seq.map (fun asm -> asm.GetName().Name) |> Seq.toList)
for _ in 1 .. 5 do test()
System.Console.Read() |> ignore

I'd like it to be generic, but putting it into a file apart from its usage yields a value restriction error:

Value restriction. The value 'generateUnique' has been inferred to have generic type val generateUnique : (('_a -> '_b) -> '_c -> seq<'_a>) when '_b :> seq<'_a> and '_c :> seq<'_a> Either make the arguments to 'generateUnique' explicit or, if you do not intend for it to be generic, add a type annotation.

Adding an explicit type parameter (let generateUnique<'T> = ...) eliminates the error, but now it returns different results.

Output without type parameter (desired/correct behavior):

["mscorlib"; "TEST"]
["FSharp.Core"; "System"]
["System.Core"; "System.Security"]
[]
[]

And with:

["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]

Why does the behavior change? How could I make the function generic and achieve the desired behavior?

Daniel
  • 47,404
  • 11
  • 101
  • 179
  • @Huusom: There's a bit more going on here. It's like `distinct` + recursive `collect` + memoization, with subtle interdependencies between them. – Daniel Jul 08 '11 at 14:18

2 Answers2

3

I don't think that your definition is quite correct: it seems to me that f needs to be a syntactic argument to generateUnique (that is, I don't believe that it makes sense to use the same HashSet for different fs). Therefore, a simple fix is:

let generateUnique f =    
    let known = HashSet()    
    fun initial ->      
        let rec loop items =         
            seq {          
                let cachedSeq = items |> Seq.filter known.Add |> Seq.cache          
                if not (cachedSeq |> Seq.isEmpty) then            
                    yield! cachedSeq            
                    yield! loop (cachedSeq |> Seq.collect f)        
            }      
        loop initial
kvb
  • 54,864
  • 2
  • 91
  • 133
  • That produces the latter, incorrect output with and without the type parameter. I expect `f` to be non-deterministic, therefore I'm passing it to the inner function (not sure if that's a good reason). – Daniel Jul 07 '11 at 19:58
3

generateUnique is a lot like the standard memoize pattern: it should be used to calculate memoized functions from normal functions, not do the actual caching itself.

@kvb was right about the change in the definition required for this shift, but then you need to change the definition of discoverAssemblies as follows:

let discoverAssemblies =
  //"memoize"
  let generator = Seq.generateUnique (fun (asm:Assembly) -> asm.GetReferencedAssemblies() |> Seq.map Assembly.Load)

  fun () ->
      AppDomain.CurrentDomain.GetAssemblies() :> seq<_>
      |> generator
Stephen Swensen
  • 22,107
  • 9
  • 81
  • 136
  • This works, and actually fixes the version with the explicit type parameter, negating the need for kvb's change. – Daniel Jul 07 '11 at 20:16
  • Cool, but I think you should still use @kvbs version of the function, since it "memoizes" `f` (a fresh `HashSet` for each `f`), whereas I think the version with the explicit type parameter only gives one `HashSet` per type! – Stephen Swensen Jul 07 '11 at 20:40