4

Today I encountered an implementation challenge with a large DU (sum-type) in F#, where I basically wanted to be able to map a set of functions over the values of the DU without having to repeat the matched case on the right hand side. The motivation being, that it would be quite easy to accidently return a wrong case, which the type checker obviously wouldn't be able to catch. To illustrate the problem, here is one candidate solution 'pattern' I came up with for F# using reflection

open FSharp.Reflection
type MyData =
    | Foo of int
    | Bar of bool
    | Baz of string
    | Hello of int
    override this.ToString () =
        let (case, _ ) = FSharpValue.GetUnionFields(this, typeof<MyData>)
        case.Name
    static member map (f : int -> int) (g : bool -> bool) (h : string -> string) x : MyData =
        let f' = box << f
        let g' = box << g
        let h' = box << h
        let cases = FSharpType.GetUnionCases typeof<MyData>
        let matchCase = cases |> Array.find (fun case -> case.Name = x.ToString())
        let res =
            match x with
            | Foo v -> f' v
            | Bar v -> g' v
            | Baz v -> h' v
            | Hello v -> f' v
        FSharpValue.MakeUnion(matchCase, [| res |]) :?> MyData

let mapMyData : MyData -> MyData = MyData.map ((+) 1) not (fun s -> s.ToUpper())
let foo : MyData = Foo 42
let newFoo : MyData = mapMyData foo
let bar : MyData = Bar true
let newBar : MyData = mapMyData bar

And now I am curious about what other solutions exists for this problem, in F# as well as in other languages that has sum-types. I can easily see this being a general problem in for example Haskell, Purescript, Idris, Rust.

The problem can be distilled down to:

  1. Mapping over the values of a sum-type without repeating the case names on the right-hand side.
  2. Leverage the type checker as much as possible for safety.
Michelrandahl
  • 3,365
  • 2
  • 26
  • 41
  • 3
    I'm not fluent in F#, does this boild down to wanting to be able to write `Foo v -> f' v` and `Hello v -> f' v` instead of `Foo v -> Foo (f v)` and `Hello v -> Hello (f v)` because you might accidentally write `Hello v -> Foo (f v)` which the type checker wouldn't catch? – isaactfa Jun 21 '23 at 15:59
  • 3
    In general, a function that can walk a sum type like this is called a "catamorphism". You might find [this article](https://fsharpforfunandprofit.com/posts/recursive-types-and-folds/#catamorphisms) helpful. – Brian Berns Jun 21 '23 at 16:28
  • that is correct @isaactfa – Michelrandahl Jun 22 '23 at 10:36

4 Answers4

4

The motivation being, that it would be quite easy to accidently return a wrong case, which the type checker obviously wouldn't be able to catch.

If you make things polymorphic enough, the type checker actually can catch such errors. I'll speak in Haskell because it's my native tongue, but I'm sure there's an F# analog.

{-# LANGUAGE LambdaCase #-}

data MyDataParameterized a b c d
  = Foo a
  | Bar b
  | Baz c
  | Hello d

-- there is only *one* well-typed implementation of this function that doesn't loop forever
quadramapParameterized :: (a -> a') -> (b -> b') -> (c -> c') -> (d -> d') ->
    MyDataParameterized a b c d -> MyDataParameterized a' b' c' d'
quadramapParameterized fa fb fc fd = \case
    Foo a -> Foo (fa a)
    Bar b -> Bar (fb b)
    Baz c -> Baz (fc c)
    Hello d -> Hello (fd d)

type MyData = MyDataParameterized Int Bool String Int

-- there are many, many "wrong" implementations of this type, but
-- the right one is just so obviously right
quadramap :: (Int -> Int) -> (Bool -> Bool) -> (String -> String) ->
    MyData -> MyData
quadramap fInt fBool fString = quadramapParameterized fInt fBool fString fInt

Of course, doing things this way is wildly painful, so practically speaking mostly people don't. They just take on the burden of manually ensuring the same constructor is used on both sides in one top-level fold, then use that fold to avoid manual pattern matching in as many places as that makes sense. Compared to other burdens facing the workaday programmer, "same constructor left and right" is very light -- it's a very local property of your code, and usually blatantly obvious when you screw up, to the point that in the rare situation where you intended to return a different constructor it warrants a comment explaining why to assuage the reader.

K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71
Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • please forgive my poor explanation in my original question, but in your solution the 'case-constructors' are still used on the right side -which was what I wanted to avoid. The practical real-life case I was looking at had 30+ cases (with potential for more future cases) where some of the names 'looked' quite similar to the human eye and thus high risk of mis-typing something without the type-checker noticing. – Michelrandahl Jun 22 '23 at 14:40
  • 1
    While this solution doesn't save on typing, the point of it is that it's not possible to mistype something without the type-checker noticing. If you mistype `Baz` as `Bar` on the right-hand side of the relevant case, the function `fc :: c -> c'` is applied to an argument of type `b`, which is a type error. – K. A. Buhr Jun 22 '23 at 16:18
  • @Michelrandahl "The case I was looking at had 30+ cases" -- which means typing 30+ words. Now, how many words did you type making this question and associated comments? "high risk of mis-typing something without the type-checker noticing" -- the type-checker will notice typos if you do things in the way this answer's code sample does. – Daniel Wagner Jun 22 '23 at 16:53
4

I think the usual approach for uniform processing of small parts of large data structures is to use some kind of generics. For your specific example, you can write a fairly straightforward Haskell solution using "Scrap Your Boilerplate" generics from the syb package:

import Data.Data               -- from `base`
import Data.Generics.Aliases   -- from `syb`

data MyData
  = Foo Int
  | Bar Bool
  | Baz String
  | Hello Int
  deriving (Show, Data)

myMap :: (Int -> Int) -> (Bool -> Bool) -> (String -> String) -> MyData -> MyData
myMap f g h x = gmapT (mkT f `extT` g `extT` h) x

which works as expected:

λ> map (myMap (*2) not ("hello "++)) [Foo 42, Bar True, Baz "world", Hello 5]
[Foo 84,Bar False,Baz "hello world",Hello 10]

Unlike "normal" functional language facilities, generics tend to have syntax and design that differs widely between languages, or even within a language, and a given solution may be very case-specific.

K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71
  • 1
    Haskell even has the honor ("honor") of supporting *two* generic programming systems. – HTNW Jun 22 '23 at 21:22
  • 1
    Ahhh, excellent. I wanted to write a bit about generics, but I couldn't see how to deal with how monomorphic the desired operation was (i.e. how many function arguments there were), and this handles that cleanly. Nice! – Daniel Wagner Jun 28 '23 at 18:48
2

In Rust at least (I can't speak for the other languages you mention), one can mutate the value in-place rather than taking & returning ownership:

enum MyData {
    Foo(i32),
    Bar(bool),
    Baz(String),
    Hello(i32),
}

impl MyData {
    fn map(&mut self, f: fn(&mut i32), g: fn(&mut bool), h: fn(&mut str)) {
        use MyData::*;
        match self {
            Foo(i) => f(i),
            Bar(b) => g(b),
            Baz(s) => h(s),
            Hello(i) => f(i),
        }
    }
}

Playgroud.

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • 4
    Seeing how all the other languages are pure functional languages (barring F# itself), this probably doesn't translate too well. But certainly a great solution for Rust. – isaactfa Jun 21 '23 at 16:03
1

If you want each case in the union to be typesafe you can make it typesafe

type Foo = Foo' of int
type Bar = Bar' of bool

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : int -> int) (g : bool -> bool) x : MyData =
        match x with
        | Foo (Foo' v) -> Foo (Foo' (f v))
        | Bar (Bar' v) -> Bar (Bar' (g v))

let foo = Foo (Foo' 42)
let newFoo = MyData.bimapish ((+) 1) not foo

OO Visitors effectively do this, they encode each case of the union with a unique type, so you can probably do it that way too.

or have i missed the point? (I accept I have repeated the case name, but your motivation was type safety, not brevity)?


someone put a comment in and deleted it, I think they objected that I was allowing this error...fair enough...but there's a cost.

type Foo = Foo' of int
type Bar = Bar' of int

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : int -> int) (g : int -> int) x : MyData =
        match x with
        | Foo (Foo' v) -> Foo (Foo' (f v))
        // WRONG this is applying an 'f' to a Bar!!! but not type error
        | Bar (Bar' v) -> Bar (Bar' (f v)) 

let foo = Foo (Foo' 42)
let newFoo = MyData.bimapish ((+) 1) ((+) 1) foo

in which case you would have to effectively make the 'map' functions typesafe...but this is becoming verbose to the caller.

type Foo = Foo' of int
type Bar = Bar' of int

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : Foo -> Foo) (g : Bar -> Bar) x : MyData =
        match x with
        | Foo v -> Foo (f v)
        //| Bar v -> Bar (f v) // now get a type error!
        | Bar v -> Bar (g v) // this works though

let foo = Foo (Foo' 42)
let newFoo = 
    MyData.bimapish 
        (fun (Foo' x) -> Foo' (x + 1)) 
        (fun (Bar' x) -> Bar' (x + 2)) 
        foo

If you are happy for the type checker to give you a warning rather than an explicit error if someone gets it wrong, you can dispense with much of the boiler plate by simply making it generic

type MyData<'a,'b> =
    | Foo of 'a
    | Bar of 'b
    static member bimapish (f : 'a -> 'a) (g : 'b -> 'b) x : MyData<'a,'b> =
        match x with
        | Foo v -> Foo (f v)
        // this DOESNT cause an error but does cause a warning, 'a has to be the same as 'b...which is enough for me to know its wrong
        //| Bar v -> Bar (f v)
        | Bar v -> Bar (g v)

let foo = Foo 42
let newFoo = 
    MyData.bimapish 
        ((+) 1) 
        ((+) 2) 
        foo

ok, and the caller CAN potentially pass the wrong functions. but for me I'd probably go with this simply for simplicity.

MrD at KookerellaLtd
  • 2,412
  • 1
  • 15
  • 17