22

A method of design in Functional Programming is making illegal states unrepresentable. I always see this being accomplished with the structure of types, but what about the value of types?

What if I have a string called Email and I only want it to hold a valid email address (checked against some Regex)? How can I do this in a functional way (without using OOP)?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Kurren
  • 827
  • 1
  • 9
  • 18
  • 5
    Make an immutable type whose constructor throws if the data is invalid. – ildjarn Feb 20 '17 at 10:08
  • 3
    In Haskell, you can write down the type `RegExp` of regular expressions: switch on `DataKinds`, and now you can write down the type of data valid for a given regexp `RegData :: RegExp -> *`, and the *singleton* types for regexps `REGEXP :: RegExp -> *`. You'll probably also need a validating parser, `regParse :: REGEXP r -> String -> Maybe (RegData r)`, so you can go from strings to validated data, given the singleton value for the regexp you want. If you can write a regexp recognizer, then `regParse` is an evidence-producing refinement of the same idea. – pigworker Feb 20 '17 at 10:25
  • 6
    "a valid email address (checked against some Regex)" Not a very good example considering it's really hard to create a regex that can properly match the full set of valid email addresses. – JAB Feb 20 '17 at 16:35
  • 7
    Please don't validate email addresses with regex. – mike3996 Feb 20 '17 at 18:27
  • @JAB it was just for illustration. You could replace it with a FirstName string that cannot contain any numeric digits or any variable where only certain values are allowed. – Kurren Feb 20 '17 at 18:31
  • 7
    @Kurren Don't suppose that [people's names don't contain numbers](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/), either :-). – Blackhole Feb 20 '17 at 19:57
  • And see also [Is it possible to define a function on a subset of an existing type?](http://stackoverflow.com/questions/19730031/is-it-possible-to-define-a-function-on-a-subset-of-an-existing-type). – Daniel Wagner Feb 20 '17 at 22:14

5 Answers5

22

A common idiom is to use a smart constructor.

module Email (email, fromEmail, Email()) where

-- export the type, but not the constructor
newtype Email = Email String

-- export this
email :: String -> Maybe Email
email s | validEmail s = Just (Email s)
        | otherwise    = Nothing

-- and this
fromEmail :: Email -> String
fromEmail (Email s) = s

This will verify emails at runtime, not compile time.

For compile time verification, one would need to exploit a GADT-heavy variant of String, or use Template Haskell (metaprogramming) to do the checks (provided the email value is a literal).

Dependent types can also ensure values are of the right form, for those languages that support them (e.g. Agda, Idris, Coq). F-star is a variant of F# that can verify preconditions/postconditions, and achieve some advanced static checks.

chi
  • 111,837
  • 3
  • 133
  • 218
  • 1
    If possible, I would love to see the GADT way of handling it. :) – Sibi Feb 20 '17 at 10:57
  • @Sibi I'm not really doing that! :-P It would probably involve something like a singleton `data C (c::Char) where A :: C 'a' ; ...` and then `data S (s::String) where Nil :: S "" ; Cons :: C c -> S s -> S (c:s)`. Then you have `data Email where Email :: S s -> Proof s Start -> Email` where `Proof s Start` painfully encodes a DFA at the type level for checking that the string is really an email. For final states you have `Pi :: Proof "" End` and for transitions "q1-c->q2" middle `Pj :: C c -> Proof s q2 -> Proof (c:s) q1` etc. Painful and not convenient at all, IMO :-P – chi Feb 20 '17 at 11:06
  • Ah, okay! I will see if I can come up with that implementation tonight, just for fun. :-) – Sibi Feb 20 '17 at 11:34
20

I'd posit, the same way you do all your runtime error handling?

If you did it "with classes and properties for encapsulation", you'd throw an exception (ie in the setter) that some code, somewhere higher up in the call chain, would have to watch out for. It's not your "classes and properties" that magically solve this, it's your discipline to throw and catch exceptions. In most any FP language you have a wide arsenal of representations for signaling erroneous values/inputs, from simple Maybe or more detailed Either (or whatever these are called in F# ;), to full-blown exceptions, to forcing immediate-halt-with-stderr-message. As suits the current app/lib context.

"making illegal states unrepresentable" in types is for pre-empting as many easy-to-make-in-the-heat-of-the-moment developer mistakes as the type system / compiler understands how-to: not user error.

There's of course academic explorations and research into how we can shift handling ever-more classes of bugs to the static (compilation) side, in Haskell there's LiquidHaskell a prominent example. But until you have a time-machine, you can't retroactively prevent compilation of your program if the input it reads once compiled is erroneous :D in other words, the only way to prevent wrong email addresses is to impose a GUI that cannot possibly let one through.

Fyodor Soikin
  • 78,590
  • 9
  • 125
  • 172
metaleap
  • 2,132
  • 2
  • 22
  • 40
11

I usually do the way @chi has done. As he stated, you can also you Template Haskell to do checks on the provided email at compile time. An example of doing that:

#!/usr/bin/env stack
{- stack
     --resolver lts-8.2
     exec ghci
     --package email-validate
     --package bytestring
-}

{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DeriveLift #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE QuasiQuotes #-}

import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.TH.Syntax
import Data.ByteString.Char8
import Text.Email.Validate

instance Lift ByteString where
  lift b = [|pack $(lift $ unpack b)|]

instance Lift EmailAddress where
  lift email = lift (toByteString email)

email :: QuasiQuoter
email =
  QuasiQuoter
  { quoteExp =
    \str ->
       let (item :: EmailAddress) =
             case (validate (pack str)) of
               Left msg -> error msg
               Right email -> email
       in [|item|]
  }

Now, if you load this up in ghci:

> :set -XQuasiQuotes
> [email|sibi@mydomain.in|]
"sibi@mydomain.in"
> [email|invalidemail|]

<interactive>:6:1: error:
    • Exception when trying to run compile-time code:
        @: not enough input
CallStack (from HasCallStack):
  error, called at EmailV.hs:36:28 in main:EmailV
      Code: quoteExp email "invalidemail"
    • In the quasi-quotation: [email|invalidemail|]

You can see how you get compile error on the invalid input.

duplode
  • 33,731
  • 7
  • 79
  • 150
Sibi
  • 47,472
  • 16
  • 95
  • 163
  • And that can all be done in one file for experimentation? Very cool. – Dogweather Mar 29 '17 at 18:28
  • Yes. I use the `exec ghci` option specifically from stack in the script: https://docs.haskellstack.org/en/stable/GUIDE/#loading-scripts-in-ghci – Sibi Mar 29 '17 at 19:20
5

As it appears, both answers of @chi and @Sibi are what Refinement Types are about. I.e., the types which enclose other types, while restricting the range of supported values with a validator. The validation can be done both at run-time and compile-time depending on the use-case.

It just so happens that I've authored "refined", a library, which provides abstractions for both cases. Follow the link for an extensive introduction.

To apply this library in your scenario, in one module define the predicate:

import Refined
import Data.ByteString (ByteString)

data IsEmail

instance Predicate IsEmail ByteString where
  validate _ value = 
    if isEmail value
      then Nothing
      else Just "ByteString form an invalid Email"
    where
      isEmail =
        error "TODO: Define me"

-- | An alias for convenince, so that there's less to type.
type EmailBytes =
  Refined IsEmail ByteString

Then use it in any other module (this is required due to Template Haskell).

You can construct the values both at compile-time and run-time:

-- * Constructing
-------------------------

{-|
Validates your input at run-time.

Abstracts over the Smart Constructor pattern.
-}
dynamicallyCheckedEmailLiteral :: Either String EmailBytes
dynamicallyCheckedEmailLiteral =
  refine "email@example.info"

{-|
Validates your input at compile-time with zero overhead.

Abstracts over the solution involving Lift and QuasiQuotes.
-}
staticallyCheckedEmailLiteral :: EmailBytes
staticallyCheckedEmailLiteral =
  $$(refineTH "email@example.info")


-- * Using
-------------------------

aFunctionWhichImpliesThatTheInputRepresentsAValidEmail :: EmailBytes -> IO ()
aFunctionWhichImpliesThatTheInputRepresentsAValidEmail emailBytes =
  error "TODO: Define me"
  where
    {-
    Shows how you can extract the "refined" value at zero cost.

    It makes sense to do so in an enclosed setting.
    E.g., here you can see `bytes` defined as a local value,
    and we can be sure that the value is correct.
    -}
    bytes :: ByteString
    bytes =
      unrefine emailBytes

Also please beware that this is just a surface of what Refinement Types can cover. There's actually much more useful properties to them.

Nikita Volkov
  • 42,792
  • 11
  • 94
  • 169
3

This was answered for me recently.

Here's the post.

The context of your question is in regards to a valid email. The overall structure of your code will leverage Active Patterns:

module File1 =

    type EmailAddress = 
        private
        | Valid   of string 
        | Invalid of string

    let createEmailAddress (address:System.String) =
        if address.Length > 0
        then Valid    address 
        else Invalid  address

    // Exposed patterns go here
    let (|Valid|Invalid|) (input : EmailAddress) : Choice<string, string>  = 
        match input with
        | Valid str -> Valid str
        | Invalid str -> Invalid str

module File2 =

    open File1

    let validEmail = Valid "" // Compiler error

    let isValid = createEmailAddress "" // works

    let result = // also works
        match isValid with
        | Valid x -> true
        | _       -> false
Community
  • 1
  • 1
Scott Nimrod
  • 11,206
  • 11
  • 54
  • 118