37

StringBuiler is a mutable object, F# encourages employing immutability as much as possible. So one should use transformation rather than mutation. Does this apply to StringBuilder when it comes to building a string in F#? Is there an F# immutable alternative to it? If so, is this alternative as efficient?

A snippet

Trident D'Gao
  • 18,973
  • 19
  • 95
  • 159
  • 1
    You could use a DList http://book.realworldhaskell.org/read/data-structures.html#data.dlist http://jackfoxy.com/f-data-structures/fsharpx-datastructures/#id35 – Mauricio Scheffer Sep 03 '13 at 17:16
  • I posted [an immutable string builder](http://stackoverflow.com/a/8346765/162396) in response to an earlier question. Tomas' test runs in 18ms using it (our machines must be similar because I get the same timings for the other versions). – Daniel Sep 03 '13 at 22:08
  • @MauricioScheffer I would be pretty interested to know what would be the comparison of DList and simple list with reversing. I suspect the function calls in DList may have some cost too... – Tomas Petricek Sep 05 '13 at 04:59
  • @TomasPetricek FSharpx's DList is slower than reversing a list. A simple function-based DList is about the same, but overflows the stack with a large number of elements. But yeah, anyway the real benefit of the DList is the efficient append, which may not be very relevant here. https://gist.github.com/mausch/6459715 – Mauricio Scheffer Sep 06 '13 at 05:00
  • @MauricioScheffer Interesting! Yeah, append is certainly the important thing about DList.. – Tomas Petricek Sep 07 '13 at 04:46

2 Answers2

54

I think that using StringBuilder in F# is perfectly fine - the fact that sb.Append returns the current instance of StringBuilder means that it can be easily used with the fold function. Even though this is still imperative (the object is mutated), it fits reasonably well with the functional style when you do not expose references to StringBuilder.

But equally, you can just construct a list of strings and concatenate them using String.concat - this is almost as efficient as using StringBuilder (it is slower, but not much - and it is significantly faster than concatenating strings using +)

So, lists give you similar performance, but they are immutable (and work well with concurrency etc.) - they would be a good fit if you were building string algorithmically, because you can just append strings to the front of the list - this is very efficient operation on lists (and then reverse the string). Also, using list expressions gives you a very convenient syntax:

// Concatenating strings using + (2.3 seconds)
let s1 = [ for i in 0 .. 25000 -> "Hello " ] |> Seq.reduce (+)
s1.Length

// Creating immutable list and using String.concat (5 ms)
let s2 = [ for i in 0 .. 25000 -> "Hello " ] |> String.concat ""
s2.Length

// Creating a lazy sequence and concatenating using StringBuilder & fold (5 ms)
let s3 = 
  seq { for i in 0 .. 25000 -> "Hello " }
  |> Seq.fold(fun (sb:System.Text.StringBuilder) s -> 
      sb.Append(s)) (new System.Text.StringBuilder())
  |> fun x -> x.ToString()
s3.Length

// Imperative solution using StringBuilder and for loop (1 ms)
let s4 = 
  ( let sb = new System.Text.StringBuilder()
    for i in 0 .. 25000 do sb.Append("Hello ") |> ignore
    sb.ToString() )
s4.Length

The times were measured on my, fairly fast, work machine using #time in F# Interactive - it is quite likely that it would be faster in release build, but I think they are fairly representative.

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • Should s2 have `List.rev` in it before the `String.concat`? As you noted above, the list will probably be constructed such that the items are in the reverse order of how you would want them concatenated. – N_A Sep 06 '13 at 17:05
9

If you have need of high performance sting concatenation, then the string builder is probably the right way to go, however, there are ways to make the string builder more functional. Generally speaking, if you need mutability in a functional program, the appropriate way to do this is to create a functional wrapper for it. In F# this is typically expressed as a computation expression. There is an example of a string builder computation expression here.

Example Usage:

//Create a function which builds a string from an list of bytes
let bytes2hex (bytes : byte []) =
    string {
        for byte in bytes -> sprintf "%02x" byte
    } |> build

//builds a string from four strings
string {
        yield "one"
        yield "two"
        yield "three"
        yield "four"
    } |> build

Edit: I made a new implementation of the above computation expression and then ran a release version of Tomas' four solutions plus my computation expression and the computation expression I previously linked.

s1 elapsed Time: 128150 ms  //concatenation
s2 elapsed Time: 459 ms     //immutable list + String.concat
s3 elapsed Time: 354 ms     //lazy sequence and concatenating using StringBuilder & fold 
s4 elapsed Time: 39 ms      //imperative
s5 elapsed Time: 235 ms     //my computation expression
s6 elapsed Time: 334 ms     //the linked computation expression

Notice that s3 takes 9 times as long as the imperative while s5 only takes 6 times as long.

Here is my implementation of the string builder computation expression:

open System.Text

type StringBuilderUnion =
| Builder of StringBuilder
| StringItem of string

let build = function | Builder(x) -> string x | StringItem(x) -> string x

type StringBuilderCE () =
    member __.Yield (txt : string) = StringItem(txt)
    member __.Yield (c : char) = StringItem(c.ToString())
    member __.Combine(f,g) = Builder(match f,g with
                                     | Builder(F),   Builder(G)   ->F.Append(G.ToString())
                                     | Builder(F),   StringItem(G)->F.Append(G)
                                     | StringItem(F),Builder(G)   ->G.Insert(0, F)
                                     | StringItem(F),StringItem(G)->StringBuilder(F).Append(G))
    member __.Delay f = f()
    member __.Zero () = StringItem("")
    member __.For (xs : 'a seq, f : 'a -> StringBuilderUnion) =
                    let sb = StringBuilder()
                    for item in xs do
                        match f item with
                        | StringItem(s)-> sb.Append(s)|>ignore
                        | Builder(b)-> sb.Append(b.ToString())|>ignore
                    Builder(sb)

let builder1 = new StringBuilderCE ()

Timer function (note that each test is run 100 times):

let duration f = 
    System.GC.Collect()
    let timer = new System.Diagnostics.Stopwatch()
    timer.Start()
    for _ in 1..100 do
        f() |> ignore
    printfn "elapsed Time: %i ms" timer.ElapsedMilliseconds
N_A
  • 19,799
  • 4
  • 52
  • 98
  • 1
    @MauricioScheffer On looking through the actual implementation, I think this is poor way to implement a string builder since it loses all of the performance benefits of the `StringBuilder` class. I'm looking at creating a version which has performance characteristics close to using a raw `StringBuilder`. – N_A Sep 03 '13 at 17:25
  • The only performance hit is using `sprintf`, which is entirely optional. I don't see any other performance issue (talking about Reader) – Mauricio Scheffer Sep 03 '13 at 17:45
  • In doing a performance test of the above-mentioned computation expression, it performed similarly to the `s3` example by Tomas, which is about a 5x performance decrease from the imperative solution. I imagine there is a way to improve that at least some. – N_A Sep 03 '13 at 18:12
  • @MauricioScheffer I added a faster version of the linked computation expression. The reader in fsharpx is great, but the whole point of using the `StringBuilder` is for better performance. I'm guessing the computation expression I posted has better performance than one which relies on lambdas because DU are cheaper than lambdas. – N_A Sep 04 '13 at 16:51
  • Oh, oops! Looks like I unintentionally reused a variable name there. Fixed. Thanks for pointing that out. – N_A Sep 04 '13 at 18:02
  • If performance is critical to the application, you should probably drop expression builders altogether. – eirik Sep 06 '13 at 15:26
  • @eirik It's always a tradeoff right? Simply using `+` to concatenate strings is 500x slower than any of the other options. Once in that range it's a question of whether or not you need that added 6x performance increase or if maintainability is more important. I find the computation expression syntax the easiest to read of these options and it has the added bonus of being the fastest of the immutable options. – N_A Sep 06 '13 at 16:25
  • @mydogisbox agreed, I still use it myself occasionally, but in some cases I simply had to refactor my code to use the tried and true methodology. I guess what I'm trying to say is that direct use of the StringBuilder API, if done correctly, can enjoy certain "functional qualities" such as composability without sacrificing any performance or readability. – eirik Sep 07 '13 at 13:49