These are the four types of error handling that are standard and
widely used in the Haskell world, as of 2014.
There exist some other libraries like attempt
(like Either
, but where you don’t care or know the type of the exception) and control-monad-exception
which implements a checked exception monad, etc. but the following ones are the standard ones seen in the wild.
Contents
- 1 Exception
- 2 Error (pure code)
- 3 Error using the Either type
- 4 No value using the Maybe type
Exception
An unexpected code path, one that rarely but can happen and can be handled if needs be. Typically caused by IO going wrong in some way, like the machine running out of swap and your program terminating, a file not existing, etc. The most basic functions are:
throw :: Exception e => e -> a
try :: Exception e => IO a -> IO (Either e a)
from Control.Exception.
Say you were writing a library to do things on reddit, you would define an exception type in your API:
data RedditException = Couldn'tUpvote | CommentFailed | LoginFailed !Text | ConnectFailure !HttpError deriving (Show,Typeable) instance Exception RedditException login :: Details -> IO () login details = do code <- tryLogin details case code of (200,val) -> setLoginContext val (_,err) -> throw (LoginFailed err)
Then later you might write try (login …)
or catch (login …) ((e :: RedditException) -> …)
to handle the exception, if needed. Another exception might be a connection failure.
See Control.Exception for more detail and related functions.
Error (pure code)
Some pure functions contain calls to error, causing the evaluation to stop and crash:
head :: [a] -> a head (x:_) = x head [] = error "empty list"
Now suppose someone writes head ages
and unexpectedly, ages
is an empty list. If you are trying to take the head of an empty list your program logic is simply broken.
A solution here is to avoid the head
function and use listToMaybe from Data.Maybe.
case listToMaybe ages of Nothing -> defaultAge Just first -> first
Alternatively, these errors can be caught from IO monad by using evaluate
and try
from Control.Exception. Ideally you should avoid partial functions like head
, but sometimes this is not an option (e.g. when using an external library)
Error using the Either type
An expected return value: Either SomeError a
The type indicates that an error is common, but doesn’t mean your program is broke. Rather that some input value wasn’t right. Typically used by parsers, consumers, that are pure and often error out.
data ParseError = ParseError !Pos !Text
So this type describes exactly what is going on:
runParser :: Parser a -> Text -> Either ParseError a
Take a parser of a
, some text to parse, and return either a parser error or a parsed a
. Typical usage would be:
main = do line <- getLine case runParser emailParser line of Right (user,domain) -> print ("The email is OK.",user,domain) Left (pos,err) -> putStrLn ("Parse error on " <> pos <> ": " <> err)
Or depending on the code one might opt instead to use a deconstructing function:
main = do line <- getLine either (putStrLn . ("Parse error: " <>) . show) (print . ("The email is OK.",)) (runParser emailParser line)
No value using the Maybe type
There is simply no value there. This isn’t a problem in the system. It means you don’t care why there isn’t a value, or you already know. Maybe a
Typical example:
lookup :: Eq a => a -> [(a,b)] -> Maybe b
That is, take some key a
that can be compared for equality, and a list of pairs where the first is the same type of the key a
and maybe return the b
of the pair, or nothing.
So one might pattern match on this:
case lookup name person of Nothing -> "no name specified" Just name -> "Name: " <> name
Or use a deconstructing function:
maybe "no name specified" ("Name: " <>) (lookup name person)
Again, depends on the code and the person writing it whether an explicit case is used. Often monads like Maybe
are composed to make a chain of possibly-nothing values:
lookup "height" profile >>= parseInt >>= flip lookup recommendedSizes
So lookup a height from a person’s profile (might not exist), parse it as integer (might not parse), then use that as a key to lookup from a mapping list of age to clothes size: [(Int,Text)]
*** Exception: Prelude.head: empty list
In Error versus Exception, Henning Thielemann makes a clear
distinction between errors and exceptions in Haskell. Even
though not all Haskellers make such distinction, it’s useful to do so
in order to talk about the most basic ways to handle computations that
can go wrong and to discuss unsafe functions such as head
,
fromJust
, and (!!)
.
On the one hand, an error is a programming mistake such as a division
by zero, the head of an empty list, or a negative index. If we
identify an error, we remove it. Thus, we don’t handle errors, we
simply fix them. In Haskell, we have error
and undefined
to cause
such errors and terminate execution of the program.
On the other hand, an exception is something that can go wrong or
that simply doesn’t follow a rule but can happen. For example, reading
a file that does not exist or that is not readable. If we identify a
possible exception, we handle it, and not doing so would be an
error. In Haskell, we have pure (Maybe
and Either
, for instance)
and impure ways to handle exceptions.
A basic example of an error in Haskell is trying to get the head of an
empty list using the head
function as defined in GHC.List
:
head :: [a] -> a
head [] = error "head: empty list"
head (x:_) = x
One way to distinguish an error from an exception is to think in terms
of contracts and preconditions. In this case, there’s a precondition
in the documentation of the head
function: the list must be
nonempty. This means that the first equation of head
is supposed to
be dead or unreachable code. This way, if we are sure that a list has
at least one element, we can extract its head:
ghci> head [104,97,115,107,101,108,108]
104
Of course, the type signature of the head
function says nothing
about such contract, which means that there’s nothing stopping us from
applying it to an empty list and therefore breaking the rules:
ghci> head []
*** Exception: head: empty list
As a comment in the definition of the fromJust
function in the
Data.Maybe
module says, «yuck.»
Even if trying to get the head of an empty list using head
is an
error, it’s unsafe to do so. We can certainly treat it as an exception
and handle it with the Maybe
data type:
data Maybe a = Nothing | Just a
In terms of exceptions, the Maybe
type represents a computation that
can fail (in the case of a Nothing
).
Let’s define a safe version of the head
function:
maybeHead :: [a] -> Maybe a
maybeHead [] = Nothing
maybeHead (x:_) = Just x
The safety of the maybeHead
function relies on its type
signature. We know that applying the function to a list can succeed:
ghci> maybeHead [104,97,115,107,101,108,108]
Just 104
Or fail:
ghci> maybeHead []
Nothing
A similar example is the fromJust
function, which
extracts the element out of a Just
:
fromJust :: Maybe a -> a
fromJust Nothing = error "Maybe.fromJust: Nothing" -- yuck
fromJust (Just x) = x
Again, it’s an error to apply fromJust
to a Nothing
, but there’s
nothing stopping us from doing it. It’s best if we use a safe function
such as fromMaybe
, which takes a default value:
fromMaybe :: a -> Maybe a -> a
fromMaybe d mx =
case mx of
Nothing -> d
Just x -> x
Yet another example is the lookup
function, which looks up a key in
an association list or dictionary:
lookup :: Eq a => a -> [(a,b)] -> Maybe b
lookup _ [] = Nothing
lookup key ((x,y):xys)
| key == x = Just y
| otherwise = lookup key xys
In this case, applying lookup
to an empty list or to a list which
doesn’t contain the key we’re looking for is not an error but an
exception, and the type signature of the function clearly specifies
that it can go wrong:
ghci> lookup 1 (zip [1..] [104,97,115,107,101,108,108])
Just 104
ghci> lookup 1 []
Nothing
ghci> lookup (-1) (zip [1..] [104,97,115,107,101,108,108])
Nothing
Now, let’s consider the elemAt
(or (!!)
) function, which is a list
index operator:
elemAt :: [a] -> Int -> a
elemAt xs n | n < 0 = error "elemAt: negative index"
elemAt [] _ = error "elemAt: index too large"
elemAt (x:_) 0 = x
elemAt (_:xs) n = elemAt xs (n - 1)
This function has two preconditions: the index must be nonnegative and
less than the length of the list. For example:
ghci> elemAt [104,97,115,107,101,108,108] 0
104
ghci> elemAt [104,97,115,107,101,108,108] (-8)
*** Exception: elemAt: negative index
ghci> elemAt [104,97,115,107,101,108,108] 8
*** Exception: elemAt: index too large
The elemAt
function is as unsafe as head
and fromJust
in that
its type signature tells us nothing about the possibility of
failure. We could define a safe version using Maybe
, but now we have
two different errors and it would be nice to provide additional
information about what went wrong, which we can accomplish with the
Either
data type:
data Either a b = Left a | Right b
In terms of exceptions, a Left
represents failure and a Right
represents success.
Here’s a safe version of elemAt
using strings for exceptions:
eitherElemAt :: [a] -> Int -> Either String a
eitherElemAt _ n | n < 0 = Left "elemAt: negative index"
eitherElemAt [] _ = Left "elemAt: index too large"
eitherElemAt (x:_) 0 = Right x
eitherElemAt (_:xs) n = eitherElemAt xs (n - 1)
We can safely apply this version of elemAt
to the lists and indexes
we used before:
ghci> eitherElemAt [104,97,115,107,101,108,108] 0
Right 104
ghci> eitherElemAt [104,97,115,107,101,108,108] 8
Left "elemAt: index too large"
ghci> eitherElemAt [104,97,115,107,101,108,108] (-8)
Left "elemAt: negative index"
We know that there are only two things that can go wrong with elemAt
and that means that a String
is too general for representing failure
in this case. We can be more specific by defining our own data type
and moving the error strings to the Show
instance:
data ElemAtError
= IndexTooLarge
| NegativeIndex
instance Show ElemAtError where
show IndexTooLarge = "elemAt: index too large"
show NegativeIndex = "elemAt: negative index"
And we can use this data type for exceptions in another safe version
of elemAt
:
errorElemAt :: [a] -> Int -> Either ElemAtError a
errorElemAt _ n | n < 0 = Left NegativeIndex
errorElemAt [] _ = Left IndexTooLarge
errorElemAt (x:_) 0 = Right x
errorElemAt (_:xs) n = errorElemAt xs (n - 1)
Which we can safely apply to the same lists and indexes as before:
ghci> errorElemAt [104,97,115,107,101,108,108] 0
Right 104
ghci> errorElemAt [104,97,115,107,101,108,108] 8
Left elemAt: index too large
ghci> errorElemAt [104,97,115,107,101,108,108] (-8)
Left elemAt: negative index
It might be confusing to call our custom data type ElemAtError
instead of ElemAtException
, but perhaps it’s a better name for
reflecting that we’re treating errors as exceptions for the sake of
safety.
It’s even more confusing once we figure out that the implementation of
error
is actually raising an exception, but a very general one. Even
then, we can be more specific about the exception that gets thrown by
making our ElemAtError
type an instance of Exception
, as follows:
instance Exception ElemAtError
Instead of calling error
, we can now throw the constructors of our
ElemAtError
data type if there’s a problem with the index:
exceptionElemAt :: [a] -> Int -> a
exceptionElemAt _ n | n < 0 = throw NegativeIndex
exceptionElemAt [] _ = throw IndexTooLarge
exceptionElemAt (x:_) 0 = x
exceptionElemAt (_:xs) n = exceptionElemAt xs (n - 1)
Which is very similar to what we had with the original elemAt
function:
ghci> exceptionElemAt [104,97,115,107,101,108,108] 0
104
ghci> exceptionElemAt [104,97,115,107,101,108,108] 8
*** Exception: elemAt: index too large
ghci> exceptionElemAt [104,97,115,107,101,108,108] (-8)
*** Exception: elemAt: negative index
But this time we can use the try
function from Control.Exception
,
which takes an action and returns either the result of that action or
an exception:
try :: Exception e => IO a -> IO (Either e a)
Since we’re using a very specific type to represent things that can go
wrong with the elemAt
function, we can also be very specific about
what to do in case that something actually goes wrong:
tryExceptionElemAt :: Show a => [a] -> Int -> IO ()
tryExceptionElemAt xs n = do
eitherExceptionElemAt <- try (evaluate (exceptionElemAt xs n))
case eitherExceptionElemAt of
Left IndexTooLarge -> print IndexTooLarge
Left NegativeIndex -> print NegativeIndex
Right elemAt -> print elemAt
Given a list xs
and an index n
, we try to get the element at that
position using exceptionElemAt
, and then use a case expression to
pattern match against the Either
returned by try
. In this case,
we’re simply printing the error or the result, which is not very
useful.
For now, we can try our lists and indexes, and see that we succesfully
handled everything that could go wrong:
ghci> tryExceptionElemAt [104,97,115,107,101,108,108] 0
104
ghci> tryExceptionElemAt [104,97,115,107,101,108,108] 8
elemAt: index too large
ghci> tryExceptionElemAt [104,97,115,107,101,108,108] (-8)
elemAt: negative index
It’s obviously better to use a safe function such as eitherElemAt
or
errorElemAt
, but exceptionElemAt
gives us a good idea of how to
raise and catch exceptions in Haskell.
Finally, let’s consider reading a file using the readFile
function,
which could fail for two reasons: the file doesn’t exist or the user
doesn’t have enough permissions to read it. We’ll use the tryJust
function, which is like try
but takes a handler that allows us to
select which exceptions are caught:
tryJust :: Exception e => (e -> Maybe b) -> IO a -> IO (Either b a)
Here’s a function that tries to read a given file:
tryJustReadFile :: FilePath -> IO ()
tryJustReadFile filePath = do
eitherExceptionFile <- tryJust handleReadFile (readFile filePath)
case eitherExceptionFile of
Left er -> putStrLn er
Right file -> putStrLn file
where
handleReadFile :: IOError -> Maybe String
handleReadFile er
| isDoesNotExistError er = Just "readFile: does not exist"
| isPermissionError er = Just "readFile: permission denied"
| otherwise = Nothing
Given a file name, we try to read it with readFile
and choose the
exceptions we’re going to handle with the handleReadFile
function. If the result of trying to read the file is a Left
, we
print the exception message. If it’s a Right
, we print the contents
of the file. The handleReadFile
function returns appropriate
messages for errors that satisfy ioDoesNotExistError
or
isPermissionError
(which are exceptions in System.IO.Error
), and
ignores any other exception.
Let’s try to read the contents of a file called haskell
before
creating it:
ghci> tryJustReadFile "haskell"
readFile: does not exist
We don’t get an *** Exception
because we handled the exception and
decided to simply print the exception message.
If we create the file and add something to it, and then try to read
its contents, we get the expected result:
$ echo [104,97,115,107,101,108,108] > haskell
ghci> tryJustReadFile "haskell"
[104,97,115,107,101,108,108]
And if we don’t have permissions to read the file, we get the expected
exception message:
ghci> tryJustReadFile "haskell"
readFile: permission denied
For more information about errors and exceptions in Haskell, see the
Error Handling chapter in Real World Haskell or the
Control.Exception
module in the base
package.
Haskell is a marvellous language, but there are some things I don’t like about it. My least favorite: Haskell has no fewer than 8
different APIs for reporting errors.
To make a bad situation worse, the choice of API varies between
popular libraries. To give a particularly unfortunate example,
Network.URI.parseURI and Network.HTTP.simpleHTTP report
errors in entirely different ways, turning a “download this URL” program into a page of code,
nearly half of which is devoted to dealing with various kinds of errors. (The rest is boilerplate that could be refactored into a nice wrapper.)
Let’s begin with a toy function, the simplest possible program that could
actually fail:
As every algebra student knows, we can’t divide by zero. Using this
function as our example, let’s take a look at all the different ways we
can implement error-reporting in Haskell.
1. Use error
The most popular way to report errors in Haskell is error
,
which works as follows:
myDiv1 :: Float -> Float -> Float
myDiv1 x 0 = error "Division by zero"
myDiv1 x y = x / y
(This is similar to the error-reporting that’s built into integer division, actually.)
We can catch the error using Control.Exception.catch
:
import qualified Control.Exception as E
example1 :: Float -> Float -> IO ()
example1 x y =
E.catch (putStrLn (show (myDiv1 x y)))
(err -> putStrLn (show err))
There are two limitations here: Our error is a free-form string, and we can
only catch errors from within the IO
monad. So this only
works for smaller, informal programs.
2. Use Maybe a
What if we don’t have access to the IO
monad? Well, we can
always use Haskell’s Maybe
type to represent a computation
that might fail:
myDiv2 :: Float -> Float -> Maybe Float
myDiv2 x 0 = Nothing
myDiv2 x y = Just (x / y)
example2 x y =
case myDiv2 x y of
Nothing -> putStrLn "Division by zero"
Just q -> putStrLn (show q)
And thanks to the magic of monads, we can actually string together calls to
myDiv2
quite nicely:
divSum2 :: Float -> Float -> Float ->
Maybe Float
divSum2 x y z = do
xdy <- myDiv2 x y
xdz <- myDiv2 x z
return (xdy + xdz)
This approach to error-reporting is used by Network.URI.parseURI,
which is included with most Haskell compilers.
3. Use Either String a
But what if we want to have different error messages for different errors?
In that case, we can use Either
to represent computations
which might return either an error message or a value:
myDiv3 :: Float -> Float ->
Either String Float
myDiv3 x 0 = Left "Divison by zero"
myDiv3 x y = Right (x / y)
example3 x y =
case myDiv3 x y of
Left msg -> putStrLn msg
Right q -> putStrLn (show q)
Once again, we can treat Either String
as a monad, allowing us
to combine these computations with a minimum of fuss:
divSum3 :: Float -> Float -> Float ->
Either String Float
divSum3 x y z = do
xdy <- myDiv3 x y
xdz <- myDiv3 x z
return (xdy + xdz)
This approach is used by many small programs that need to recover from
multiple kinds of non-IO errors, but I don’t think it appears in the
standard Haskell libraries.
4. Use Monad
and fail
to generalize 1–3
But what if we don’t care what
monad our caller is using? In that case, we can rewrite our code to work
in any monad m
, and use fail
to report the error.
myDiv4 :: (Monad m) => Float -> Float ->
m Float
myDiv4 x 0 = fail "Divison by zero"
myDiv4 x y = return (x / y)
This will do the right thing if our caller is expecting
Maybe
or Either
:
example4a x y =
case myDiv4 x y of
Nothing -> putStrLn "Division by zero"
Just q -> putStrLn (show q)
example4b x y =
case myDiv4 x y of
Left msg -> putStrLn msg
Right q -> putStrLn (show q)
You can even use it with the IO
monad!
example4c x y =
E.catch (do q <- myDiv4 x y
putStrLn (show q))
(err -> putStrLn (show err))
This style of error-reporting is used widely in the standard libraries,
because it’s so flexible. You can find several examples in
Data.Map.
If you’re writing new Haskell libraries for public consumption, and all
your errors are strings, please consider using this error-reporting
method.
5. Use MonadError
and a custom error type
What if we want to keep track of specific types of errors? In that case,
we could use the error Error
type class:
import Control.Monad.Error
data CustomError = DivByZero
| OutOfCheese
| MiscError String
instance Show CustomError where
show DivByZero = "Division by zero"
show OutOfCheese = "Out of cheese"
show (MiscError str) = str
instance Error CustomError where
noMsg = MiscError "Unknown error"
strMsg str = MiscError str
This works like the fail
example, but instead of using error
messages, we use error values:
myDiv5 :: (MonadError CustomError m) =>
Float -> Float -> m Float
myDiv5 x 0 = throwError DivByZero
myDiv5 x y = return (x / y)
example5 :: Float -> Float ->
Either CustomError String
example5 x y =
catchError (do q <- myDiv5 x y
return (show q))
(err -> return (show err))
Note that this approach will work in almost any monad except the IO
monad. This approach will also fail if we start mixing libraries,
because each library will define its own set of errors, and we’ll need to
write code which converts them all to our preferred error type.
This approach is used by many popular libraries, including parsec. An
unusual variant of this approach is used by Network.HTTP, which
returns values of type IO (Either ConnError a)
, but doesn’t
make ConnError
an instance of Error
.
6. Use throwDyn
in the IO monad
We can also use our custom error type in the IO
monad, thanks
to throwDyn
and catchDyn
from
Control.Exception.
import Data.Typeable
data CustomError = DivByZero
| OutOfCheese
| MiscError String
deriving (Typeable)
myDiv6 :: Float -> Float -> IO Float
myDiv6 x 0 = E.throwDyn DivByZero
myDiv6 x y = return (x / y)
example6 x y =
E.catchDyn (do q <- myDiv6 x y
putStrLn (show q))
handler
where handler :: CustomError -> IO ()
handler err = putStrLn (show err)
This relies on the fact that Exception
is extensible, thanks
to its DynException
constructor. If you’re working in the IO
monad, this approach is almost ideal for production code: You get support
for custom exception types, it’s easy to make libraries compatible, and
it’s compatible will all the other IO
-based examples we’ve
seen.
Note that this very flexible approach could generalized to
non-IO
monads by making Exception
an instance of
Error
, and writing appropriate versions of
throwDyn
and catchDyn
for
MonadError
. This would actually be very convenient for people
who have to work with many libraries at once. But I’ll refrain from
actually providing code, because there’s too many error-reporting
conventions already!
7. Use ioError
and catch
This is a close cousin to the throwDyn
example above. It also
relies on Exception
.
myDiv7 :: Float -> Float -> IO Float
myDiv7 x 0 = ioError (userError "Division by zero")
myDiv7 x y = return (x / y)
example7 :: Float -> Float -> IO ()
example7 x y =
catch (do q <- myDiv7 x y
putStrLn (show q))
(err -> putStrLn (show err))
This one is pretty rare, as far as I can tell.
8. Go nuts with monad transformers
Several of the error-reporting approaches we’ve seen are based on
non-IO
monads. Most of these can can be generalized to the
equivalent monad transformers. For example, Either CustomError
becomes:
a
type ErrIO = ErrorT CustomError IO
myDiv8 :: Float -> Float -> ErrIO Float
myDiv8 x 0 = throwError DivByZero
myDiv8 x y = return (x / y)
example8 x y = do
result <- runErrorT (myDiv8 x y)
case result of
Left err -> putStrLn (show err)
Right q -> putStrLn (show q)
In the IO monad, this is usually a bad idea (though, again, I’ve seen it).
Instead, consider using throwDyn
. But if you’re working with
a base monad other than IO
, this can occasionally be useful.
A plea for consistency
Several of these error-reporting approaches offer interesting insights into
Haskell. And most of them have legitimate uses.
But I’d be just as happy if we could standardize on two or three of the above whenever possible!
Update: Don Stewart has started a thread about Haskell error-handling conventions on the Haskell library list.
It is one of
the most useful functions to report errors, error function takes a string
message and display the message to the console.
Following is
the signature of error function.
error :: [Char]
-> a
As you observe,
it has return type of a, so we can call this from any function, it always has
right type. When you call error function from your code, it immediately aborts
the execution and prints the error message.
errorUtil.hs
divide x 0 = error "You can't divide a number by 0" divide x y = x / y
When you call the divide method by
passing second argument 0, it throws an Exception immediately.
*Main> :load errorUtil.hs [1 of 1] Compiling Main ( errorUtil.hs, interpreted ) Ok, modules loaded: Main. *Main> *Main> divide 10 0 *** Exception: You can't divide a number by 0 *Main>
We can rewrite
above program using Maybe data type like below.
errorUtil.hs
divide x 0 = Nothing divide x y = Just (x / y)
*Main> :load errorUtil.hs [1 of 1] Compiling Main ( errorUtil.hs, interpreted ) Ok, modules loaded: Main. *Main> *Main> divide 10 0 Nothing *Main> *Main> divide 10 123456 Just 8.100051840331778e-5 *Main>
Whenever you want to throw an error, try
to use Maybe.
Another
example
Suppose, you car going to implement
factorial function, usually factorial os not applicable for -ve numbers, so
when user call factorial function with –ve numbers, your function should throw
an error.
Sample.hs
factorial :: Int -> Int factorial num | (num == 0) = 1 | (num > 0) = num * factorial(num-1) | otherwise = error "factorial is only defined for +ve integers"
Prelude> :load Sample.hs [1 of 1] Compiling Main ( Sample.hs, interpreted ) Ok, modules loaded: Main. *Main> *Main> factorial 0 1 *Main> factorial 5 120 *Main> factorial (-9) *** Exception: factorial is only defined for +ve integers
Previous
Next
Home
Exceptions
Creative Commons image by gynti
Most languages make a distinction between values that represent failure
(errors) and the mechanism to abort computations and unwind the stack
(exceptions.) Haskell is unique in that the type system makes it safe
and easy to build failure into types instead of lumping everything into
something like NULL
or -1
.
It also stands out by supporting exceptions through a library of
functions and types instead of directly in the syntax of the language.
The fact that there’s no dedicated keywords for exceptions might seem
weird until you discover how flexible and expressive Haskell is.
This presentation aims to show how closely related errors and exceptions
are, and how to keep them separate.
Haskell Exceptions
-
No dedicated syntax
-
Very limited in Haskell 2010
-
Expanded by GHC
Type Inhabitants
In order to understand how exceptions work we first need to talk about
type inhabitants and bottom.
The Bool
type is a very simple type that doesn’t use any type
variables and only has 2 data constructors. This means that there can
only be 2 unique values for this type. Or does it?
Bottom (⊥)
All types in Haskell support a value called bottom. This means that the
Bool
type actually has 3 possible values. Exceptions and
non-termination are examples of bottom values.
The list below illustrates that bottom values aren’t a problem until
they’re evaluated.
bools :: [Bool]
bools = [False, True, undefined]
Creating ⊥
Haskell includes 2 functions for creating bottom values: undefined
and
error
. In GHC undefined
is implemented using error
and error
throws an exception.
You can create a bottom value directly by writing a non-terminating
function.
-- Raises exceptions in GHC:
undefined :: a
error :: String -> a
-- Non-termination:
badBoy :: a
badBoy = badBoy
Catching Exceptions (Inline)
Catching exceptions is straight forward as long as you remember that you
can only catch exceptions in the IO
monad.
inline :: Int -> IO Int
inline x =
catch (shortFuse x)
((_ex :: StupidException) -> return 0)
The second argument to catch
is a function to handle a caught
exception. GHC uses the type of the function to determine if it can
handle the caught exception. If GHC can’t infer the type of the function
you’ll need to add a type annotation like in the example above. This
requires the ScopedTypeVariables
extension.
If you want to handle more than one exception type you’ll need to use
something like the catches
function. To catch all possible exceptions
you can catch the SomeException
type since it’s at the top of the
exception type hierarchy. This isn’t generally wise and instead you
should use something like the bracket
or finally
functions.
One interesting thing to note is that GHC differs from Haskell 2010 with
regards to catch
. Haskell 2010 states that catch
should catch all
exceptions regardless of their type. Probably because those exceptions
would all be IOError
s.
Catching Exceptions (w/ a Helper)
Below is another example of catching exceptions. This time a helper
function with an explicit type signature is used to handle the
exception. This allows us to avoid inline type annotations and the
ScopedTypeVariables
extension.
helper :: Int -> IO Int
helper x =
catch (shortFuse x)
handler
where
handler :: StupidException -> IO Int
handler _ = return 0
Throwing Exceptions
Throwing exceptions is really easy, although you must be in the IO
monad to do so. Haskell 2010 provides a set of functions for creating
and raising exceptions.
Haskell 2010:
-- Create an exception.
userError :: String -> IOError
-- Raise an exception.
ioError :: IOError -> IO a
-- fail from the IO Monad is both.
fail = ioError . userError :: String -> IO a
Throwing Exceptions
GHC adds on to Haskell 2010 with functions like throwIO
and throw
.
The throw
function allows you to raise an exception in pure code and
is considered to be a misfeature.
GHC:
shortFuse :: Int -> IO Int
shortFuse x =
if x > 0
then return (x - 1)
else throwIO StupidException
Throwing from Pure Code
As mentioned above, GHC adds a throw
function that allows you to raise
an exception from pure code. Unfortunately this makes it very difficult
to catch.
naughtyFunction :: Int -> Int
naughtyFunction x =
if x > 0
then x - 1
else throw StupidException
Catching Exceptions From throw
You need to ensure that values are evaluated because they might contain
unevaluated exceptions.
In the example below you’ll notice the use of the «$!
» operator. This
forces evaluation to WHNF so exceptions don’t sneak out of the catch
function as unevaluated thunks.
forced :: Int -> IO Int
forced x =
catch (return $! naughtyFunction x)
((_ex :: StupidException) -> return 0)
Creating Custom Exceptions
Any type can be used as an exception as long as it’s an instance of the
Exception
type class. Deriving from the Typeable
class makes
creating the Exception
instance trivial. However, using Typeable
means you need to enable the DeriveDataTypeable
GHC extension.
You can also automatically derive the Show
instance as with most other
types, but creating one manually allows you to write a more descriptive
message for the custom exception.
data StupidException = StupidException
deriving (Typeable)
instance Show StupidException where
show StupidException =
"StupidException: you did something stupid"
instance Exception StupidException
Threads and Exceptions
Concurrency greatly complicates exception handling. The GHC runtime uses
exceptions to send various signals to threads. You also need to be very
careful with unevaluated thunks exiting from a thread when it
terminates.
Additional problems created by concurrency:
-
Exceptions are used to kill threads
-
Exceptions are asynchronous
-
Need to mask exceptions in critical code
-
Probably don’t want unevaluated exceptions leaking out
There’s a Package For That
Just use the async package.
Errors (Instead of Exceptions)
-
Explicit
-
Checked by the compiler
-
Way better than
NULL
or-1
Stupid
Haskell is great about forcing programmers to deal with problems at
compile time. That said, it’s still possible to write code which may not
work at runtime. Especially with partial functions.
The function below will throw an exception at runtime if it’s given an
empty list. This is because head
is a partial function and only works
with non-empty lists.
stupid :: [Int] -> Int
stupid xs = head xs + 1
Better
Prefer errors to exceptions.
A better approach is to avoid the use of head
and pattern match the
list directly. The function below is total since it can handle lists
of any length (including infinite lists).
Of course, if the list or its head is bottom (⊥) then this function will
throw an exception when the patterns are evaluated.
better :: [Int] -> Maybe Int
better [] = Nothing
better (x:_) = Just (x + 1)
Reusing Existing Functions
This is the version I like most because it reuses existing functions
that are well tested.
The listToMaybe
function comes with the Haskell Platform. It takes a
list and returns its head in a Just
. If the list is empty it returns
Nothing
. Alternatively you can use the headMay
function from the
Safe package.
reuse :: [Int] -> Maybe Int
reuse = fmap (+1) . listToMaybe
Providing Error Messages
Another popular type when dealing with failure is Either
which allows
you to return a value with an error. It’s common to include an error
message using the Left
constructor.
Beyond Maybe
and Either
it’s also common to define your own type
that indicates success or failure. We won’t discuss this further.
withError :: [Int] -> Either String Int
withError [] = Left "this is awkward"
withError (x:_) = Right (x + 1)
Maybe and Either
Maybe
and Either
are also monads!
If you have several functions that return one of these types you can use
do
notation to sequence them and abort the entire block on the first
failure. This allows you to write short code that implicitly checks the
return value of every function.
Things tend to get a bit messy when you mix monads though…
Maybe and IO
The code below demonstrates mixing two monads, IO
and Maybe
. Clearly
we want to be able to perform I/O but we also want to use the Maybe
type to signal when a file doesn’t exist. This isn’t too complicated,
but what happens when we want to use the power of the Maybe
monad to
short circuit a computation when we encounter a Nothing
?
size :: FilePath -> IO (Maybe Integer)
size f = do
exist <- fileExist f
if exist
then Just <$> fileSize f
else return Nothing
Maybe and IO
Because IO
is the outer monad and we can’t do without it, we sort of
lose the superpowers of the Maybe
monad.
add :: FilePath -> FilePath -> IO (Maybe Integer)
add f1 f2 = do
s1 <- size f1
case s1 of
Nothing -> return Nothing
Just x -> size f2 >>= s2 ->
case s2 of
Nothing -> return Nothing
Just y -> return . Just $ x + y
MaybeT
Using the MaybeT
monad transformer we can make IO
the inner monad
and restore the Maybe
goodness. We don’t really see the benefit in the
sizeT
function but note that its complexity remains about the same.
sizeT :: FilePath -> MaybeT IO Integer
sizeT f = do
exist <- lift (fileExist f)
if exist
then lift (fileSize f)
else mzero
MaybeT
The real payoff comes in the addT
function. Compare with the add
function above.
addT :: FilePath -> FilePath -> IO (Maybe Integer)
addT f1 f2 = runMaybeT $ do
s1 <- sizeT f1
s2 <- sizeT f2
return (s1 + s2)
Either and IO
This version using Either
is nearly identical to the Maybe
version
above. The only difference is that we can now report the name of the
file which doesn’t exist.
size :: FilePath -> IO (Either String Integer)
size f = do
exist <- fileExist f
if exist
then Right <$> fileSize f
else return . Left $ "no such file: " ++ f
Either and IO
To truly abort the add
function when one of the files doesn’t exist
we’d need to replicate the nested case
code from the Maybe
example.
Here I’m cheating and using Either
‘s applicative instance. However,
this doesn’t short circuit the second file test if the first fails.
add :: FilePath -> FilePath -> IO (Either String Integer)
add f1 f2 = do
s1 <- size f1
s2 <- size f2
return ((+) <$> s1 <*> s2)
ErrorT
The ErrorT
monad transformer is to Either
what MaybeT
is to
Maybe
. Again, changing size
to work with a transformer isn’t that
big of a deal.
sizeT :: FilePath -> ErrorT String IO Integer
sizeT f = do
exist <- lift $ fileExist f
if exist
then lift $ fileSize f
else fail $ "no such file: " ++ f
ErrorT
But it makes a big difference in the addT
function.
addT :: FilePath -> FilePath -> IO (Either String Integer)
addT f1 f2 = runErrorT $ do
s1 <- sizeT f1
s2 <- sizeT f2
return (s1 + s2)
Hidden/Internal ErrorT
The really interesting thing is that we didn’t actually have to change
size
at all. We could have retained the non-transformer version and
used the ErrorT
constructor to lift the size
function into the
transformer. The MaybeT
constructor can be used in a similar way.
addT' :: FilePath -> FilePath -> IO (Either String Integer)
addT' f1 f2 = runErrorT $ do
s1 <- ErrorT $ size f1
s2 <- ErrorT $ size f2
return (s1 + s2)
Turning Exceptions into Errors
The try
function allows us to turn exceptions into errors in the form
of IO
and Either
, or as you now know, ErrorT
.
It’s not hard to see how flexible exception handling in Haskell is, in
no small part due to it not being part of the syntax. Non-strict
evaluation is the other major ingredient.
try :: Exception e => IO a -> IO (Either e a)
-- Which is equivalent to:
try :: Exception e => IO a -> ErrorT e IO a
Final Thought
-
Prefer Errors to Exceptions!
-
Don’t Write/Use Partial Functions!
Show me how you handle errors and I’ll tell you what programmer you are. Error handling is fundamental to all programming. Language support for error handling varies from none whatsoever (C) to special language extensions (exceptions in C++, Java, etc.). Haskell is unique in its approach because it’s expressive enough to let you build your own error handling frameworks. Haskell doesn’t need built-in exception support: it implements it in libraries.
We’ve seen one way of dealing with errors: calling the error
function that terminates the program. This works fine for runtime assertions, which alert us to bugs in the program. But many «errors» are actually expected. We’ve seen one such example: Data.Map.lookup
fails when called with a key that’s not present in the map. The possibility of failure is encoded in the Maybe
return type of lookup
. It’s interesting to compare this with similar functions in other languages. In C++, for instance, std::map
defines multiple accessor functions varying only in their failure behavior:
at
throws an exceptionfind
returns an empty iteratoroperator[]
inserts a dummy value using a default constructor for it.
The last one is the most bizarre of the three. Since the array access operator must return a reference, even if the key is not found, it has to create a dummy value. The behavior of at
is potentially dangerous if the client forgets to catch the exception. Of the three, find
is the safest, since the return type suggests to the client iteration rather than straight dereference; and iteration normally starts with checking for termination.
In functional programming, failure is another way of saying that the computation is partial; that is, not defined for all values of arguments. In Haskell we always try to use total functions — functions defined for all values of their arguments. If the domain of a computation is known at compile time, we can often define a restricted data type to be used for its arguments; for instance, an enumeration instead of an integer. This is not always possible or feasible, so the other option is to turn a partial function into a total function by changing its return type. C++ method find
does this trick by returning an iterator (it always returns an iterator for any value of its argument); Haskell lookup
does it by returning a Maybe
.
This trick of returning a different type in order to turn a non-functional computation into a pure function is used extensively in Haskell and finds its full expression in monads.
«Computation» or «notion of computation» often describes what we want to do in terms that may or may not have immediate pure function implementation: that is values in, values out. Transforming computations into functions is what functional programming is all about.
Either May Be Better than Maybe
The trick with Maybe
is a bit limited. All we know about the failure is that it occurred. In practice, we’d like to know more. So the next step in error handling is to use the Either
data structure defined in the Prelude:
data Either a b = Left a | Right b
Either
is parameterized by two types, not one. A value of the Either
type either contains a value of type a
or of type b
. We can discriminate between the two possibilities by pattern matching on either constructor. Either
is mostly used as a generalization of Maybe
in which Left
not only encodes failure but is accompanied by an error message. Right
encodes success and the accompanying value. So a
is often fixed to be a String
.
Here’s how we can use Either
to encode the failure of a lookup without terminating the whole program:
lookUp :: String -> SymTab -> Either String (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> Right (v, symTab)
Nothing -> Left ("Undefined variable " ++ str)
Now the burden is on the caller to pattern match the result of any such call and either continue with the successful result, or handle the failure. More often than not, the error is just passed unchanged to the caller, and so on; and somewhere at the top of the call tree, displayed to the user (remember, the very top is always an IO
function).
It’s a know phenomenon that error propagation spreads like a disease throughout the code. You could have seen that in C and sometimes even in C++, when exceptions are not an option. And indeed, you can create the same mess in Haskell. Here’s a naive implementation of the evaluator of SumNode
, which checks for errors that can happen in the evaluation of its children, and propagates them if necessary:
evaluate :: Tree -> SymTab -> Either String (Double, SymTab)
evaluate (SumNode op left right) symTab =
case evaluate left symTab of
Left msg -> Left msg
Right (lft, symTab') ->
case evaluate right symTab' of
Left msg -> Left msg
Right (rgt, symTab'') ->
case op of
Plus -> Right (lft + rgt, symTab'')
Minus -> Right (lft - rgt, symTab'')
Notice the creeping indentation. I’m showing you this code so that you know what awaits you if you refuse to learn about monads. However ugly this code is, it works, so the principle is right. We just need some monadic sugar to make it palatable.
Errors are propagated up, until they are finally dealt with in the main IO loop:
loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
in
case evaluate tree symTab of
Left msg -> do
putStrLn $ "Error: " ++ msg
loop symTab -- use old symTab
Right (v, symTab') -> do
print v
loop symTab'
The nice thing is that, since we keep around the old copy of the symbol table, we can start the next iteration after a failure as if nothing happened. It’s as if one transaction had been aborted and another started from the the same state.
Abstracting the Either Pattern
Looking at the code above, you can see a pattern arising. First of all, we have all these functions that return their results wrapped in the Either
type. Every time we get an Either
value, we pattern match it and fork the computation: When the result is Right
we make the value in it available to the rest of the computation. If the result is Left
, we skip the rest of the computation and propagate the error. We would like to capture this pattern. We’d like to isolate the boilerplate code, leaving «holes» for the client-provided variables. One such hole is to be filled by the initial Either
value. The tricky part is the «skip the rest of the computation» part of the pattern. This can be done if the pattern has another hole for «the rest of the computation» so that it can either execute it or not.
Let’s first identify this pattern in the evaluation of a unary node:
The first pattern filler is the value returned by evaluate tree symTab
. The «rest of the calculation» filler is the code:
case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab')
We can abstract this code into a lambda function. Notice that this code uses x
and symTab'
, which were extracted from the first filler. So when we abstract it, the resulting lambda function should take this tuple as an argument:
(x, symTab') -> case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab')
The parameter op
is captured by the lambda from the outer environment — remember, lambdas are closures: they can capture values from the environment in which they are defined.
Let’s call this new pattern bindE
and implement it as a function of two arguments. Here’s how we would call it from the evaluator:
evaluate (UnaryNode op tree) symTab =
bindE (evaluate tree symTab)
((x, symTab') ->
case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab'))
This is just a straight call to bindE
with two arguments — the second one being a multi-line lambda function.
The implementation of bindE
is pretty straightforward. It just picks the common parts of the pattern and combines them in one function. It deals with the tedium of pattern matching the first argument and propagating the error. In case of success, it calls the continuation with the right arguments:
bindE :: Either String (Double, SymTab)
-> ((Double, SymTab) -> Either String (Double, SymTab))
-> Either String (Double, SymTab)
bindE ev k =
case ev of
Left msg -> Left msg
Right (x, symTab') -> k (x, symTab')
Have a good look at the signature of bindE
. Again, the second argument to bindE
is the function that encapsulates the rest of the computation. It takes a pair (Double, SymTab)
and returns another pair encapsulated in Either
. This function argument is often called a continuation, because it continues the computation. The corresponding argument is often named k
(think kontinuation; c
would be too generic a name).
This pattern can be further abstracted by parameterizing it on the type of the contents of Right
. In fact we need this if we want to include addSymbol
in our scheme (see the unit in the return type):
addSymbol :: String -> Double -> SymTab -> Either String ((), SymTab)
If we want to bind the result of, say, UnaryNode
evaluator to addSymbol
, the input will be of type Either String (Double, SymTab)
and the result of type Either String ((), SymTab)
, so bindE
really needs two type parameters, a
and b
:
bindE :: Either String a
-> (a -> Either String b)
-> Either String b
bindE ev k =
case ev of
Left msg -> Left msg
Right v -> k v
Notice that as long as the client only uses bindE
to deal with Either
values, they don’t have to deal with them directly (e.g., pattern match), except when they have to create them; and in the final unpacking, when they want to display the error message. The creation of Either
values can also be abstracted by providing two more functions, return
and fail
:
return :: a -> Either String a
return x = Right x
fail :: String -> Either a
fail msg = Left msg
We can use these functions in the implementation of lookUp
lookUp :: String -> SymTab -> Either String (Double, SymTab)
lookUpb str symTab =
case M.lookup str symTab of
Just v -> return (v, symTab)
Nothing -> fail ("Undefined variable " ++ str)
We can also replace all Right
constructors that are used to return values by calls to return
. This way the client code can be written without any knowledge of the fact that values are encapsulated in the Either
type.
Here’s, for instance, an improved version of the UnaryNode
evaluator, this time with no mention of Either
or any of its constructors:
evaluate (UnaryNode op tree) symTab =
bindE (evaluate tree symTab)
((x, symTab') ->
case op of
Plus -> return ( x, symTab')
Minus -> return (-x, symTab'))
Maybe it doesn’t seem like these transformations buy us much in code size or readability, but they are a step in the direction of hiding the details of error handling.
But we can do even better, once we realize that we have just defined a monad.
The Either Monad
A monad in Haskell is defined by a type constructor (a type parameterized by another type) and two functions, bind and return (optionally, fail). In our case, the type constructor is based on the Either a b
type, with the first type variable fixed to String
(yes, it’s exactly like currying a type function). Let’s formalize it by defining a new (paremeterized) type:
newtype Evaluator a = Ev (Either String a)
The newtype
declaration is a compromise between type
alias and a full-blown data
declaration. It can be used to define data types that have only one data constructor that takes only one argument — Ev
in this case. (newtype
is preferred over similar data
declaration because of slightly better performance characteristics.)
The first change to our code will be to replace the type signature of evaluate
from:
evaluate :: Tree -> SymTab -> Either String (Double, SymTab)
to:
evaluate :: Tree -> SymTab -> Evaluator (Double, SymTab)
We can now redefine bindE
, return
, and fail
in terms of Evaluator
. Then we have to tell Haskell that we are defining a monad. Why does Haskell need to know about our monad? Because with this knowledge it will allow us to use the do
notation — that’s the sugar we’ve been craving.
Monad
is a typeclass — I’ll talk more about typeclasses soon. For now, it’s enough to know that, in order to tell Haskell that we are defining a monad, we need to instantiate Monad
with our Evaluator
. When instantiating a Monad
we have to provide the appropriate definitions of bind, return and, optionally, fail. The only tricky one is bind, since Monad
defines it as an infix operator, >>=
(pronounced bind). Operators are just functions with funny names composed of special characters. An infix operator is a function that takes two arguments, one on the left and one on the right. Without further ado, here’s the instance declaration for our first monad:
instance Monad Evaluator where
(Ev ev) >>= k =
case ev of
Left msg -> Ev (Left msg)
Right v -> k v
return v = Ev (Right v)
fail msg = Ev (Left msg)
These are exactly the definitions we’ve seen before, except for the additional layer of the Ev
constructor. Notice that I used infix notation in defining operator >>=
. Its left argument is the pattern (Ev ev)
, and its right argument is the continuation k
.
Let’s see how this new monad works in the evaluator of SumNode
— first without the do
notation:
evaluate (SumNode op left right) symTab =
evaluate left symTab >>= (lft, symTab') ->
evaluate right symTab' >>= (rgt, symTab'') ->
case op of
Plus -> return (lft + rgt, symTab'')
Minus -> return (lft - rgt, symTab'')
Notice that the second argument to the first >>=
is a lambda that continues up to the end of the function. Inside its body there is another >>=
with its own lambda. Notice also that the innermost lambda has access not only to rgt
— which is its argument — but also to the external lft
and op
, which it captures from its environment.
Here’s the same code in do
notation:
evaluate (SumNode op left right) symTab = do
(lft, symTab') <- evaluate left symTab
(rgt, symTab'') <- evaluate right symTab'
case op of
Plus -> return (lft + rgt, symTab'')
Minus -> return (lft - rgt, symTab'')
As you can see, the do
block hides the binding >>=
between the lines, it automatically converts «the rest of the code» to continuations, and it lets you treat the arguments to lambdas as if they were local variables to be assigned to. It’s this syntactic sugar that makes do
blocks look so convincingly imperative, even though they are purely functional in nature.
Compare this with our starting point:
evaluate (SumNode op left right) symTab =
case evaluate left symTab of
Left msg -> Left msg
Right (lft, symTab') ->
case evaluate right symTab' of
Left msg -> Left msg
Right (rgt, symTab'') ->
case op of
Plus -> Right (lft + rgt, symTab'')
Minus -> Right (lft - rgt, symTab'')
That’s definitely progress. If we could only get rid of this noise of threading symTab
all over the place. Oh, wait, that’s the next tutorial and another monad.
Let’s bask in the monadic sunshine some more. Did you notice that we have essentially implemented exceptions? fail
behaves very much like throw
. It shortcuts the execution of the current function, propagates the error to its caller, who will in turn shortcut its execution and so on, until the «exception is caught.» Catching the exception means unpacking the Evaluator
returned by a function, and either retrieving the result or doing something with the error.
Can you forget to «catch» the exception? Not really — it’s part of the return type. You can’t even access the value of your computation without unpacking it. And if your unpacking matches all possible Either
patterns, as it should, you’ll be forced to deal with the error case anyway.
Do you remember exception specifications in Java or C++ (currently obsoleted)? In Haskell there’s no need for this ad hoc feature because «exception specification» is encoded in the return type of a function. You can’t ignore types in Haskell, so you essentially leave it to the compiler to enforce exception safety.
There is a full-blown exception library Control.Exception
in Haskell, complete with throw
, catch
, and a long list of predefined exception types. The Prelude also defines the Monad
instance for Either
, so we could have used it directly. But I thought «inventing» the monad from scratch would give you a better learning experience.
Type Classes
I said that Monad
is a typeclass, but I haven’t explained what a typeclass is. It’s not exactly like a class in OO languages, but there are some similarities.
If you’re familiar with Java or C++, you know that a class may define a set of virtual functions. When you have a polymorphic reference to one of the descendants of such a class, and call one of these virtual functions on it, the function that is called depends on the actual type of the object, which is determined at runtime. This kind of late binding is possible because a polymorphic object carries with it a vtable: an array of function pointers.
Similarly, in Haskell, a type class let’s you define the signatures of a set of functions. You may think of a typeclass as an interface. (Although it is possible for a typeclass to provide default implementations for some of the functions — the monadic fail
, for instance, has such implementation: it calls error
by default.) There’s even an analog of a vtable that is (invisibly) passed to polymorphic functions.
The actual implementation of typeclass functions is provided by the client every time they declare some type to be an instance
of a typeclass. In Java or C++ this would happen when the client defines concrete classes that implement an interface. In Haskell you define instances instead. The interesting thing is that, in Haskell, the client is able to connect any typeclass with any type (including built-in types) «after the fact.» (In most OO languages you can’t make an int
a subclass of IFoo
— in Haskell you can.)
Let’s work through an example:
class Valuable a where
evaluate :: a -> Double
data Expr = Const Double | Add Expr Expr
instance Valuable Expr where
evaluate (Const x) = x
evaluate (Add lft rgt) = evaluate lft + evaluate rgt
instance Valuable Bool where
evaluate True = 1
evaluate False = 0
test :: Valuable a => a -> IO ()
test v = print $ evaluate v
expr :: Expr
expr = Add (Const 2) (Add (Const 1.5) (Const 2.5))
main = do
test expr
test True
I have defined a typeclass Valuable
with one function evaluate
, which takes an instance of Valuable
, a
, and returns a Double
. Separately I have defined a data type Expr
, which has no knowledge of the class Valuable
. Then I realized that Expr
is Valuable
, so I wrote an instance
declaration that makes this connection and provides the «witness» — the actual implementation of the function evaluate
.
To make things even more exciting, I decided that the built-in type Bool
is also Valuable
and provided and instance
declaration to prove it. Now I was able to write a polymorphic function test
that calls evaluate
on its argument. But unlike in previous examples of polymorphism, the generic argument to test
is constrained: it has to be an instance of the class Valuable
. The type expression before the double arrow =>
defines class constraints for what follows. In this case we have one constraint, Valuable a
, meaning a
must be an instance of Valuable
. In general there may be several such constraints listed between a set of parentheses and separated by commas.
Hadn’t I provided the type signature for test
, the compiler would have figured it out by analyzing the body of test
and seeing that I called evaluate
on its argument.
This is an example of a more general mechanism: If you define a function that adds its arguments, the compiler will deduce the Num
constraint for them. If you compare arguments for (in-) equality, it will deduce Eq
constraints. If you print
them, it will deduce Show
, etc.
The C++ Standards Committee almost voted in the concepts proposal in 2011, which would have added constrained polymorphism to C++ templates. But then they gave up because of the tremendous complexity imposed by the requirement of backwards compatibility.
Solution to the Expression Problem
In the previous tutorial I described the expression problem: How can you create a library that would be open to adding new data and new functions. We’ve seen that, in Haskell, extending a library by adding new functions was easy, but the addition of new varieties of data required modifications to the library. Clever use of typeclasses will let us have the cake and eat it too.
Let’s work the magic on the previous example. First, instead of having one type Expr
with many constructors, let’s replace it with one typeclass Expr
and many individual data types (by many I mean two in this case). We then glue these data types together by making them instances of Expr
. Expr
itself is empty — its only purpose is to unify all expression types.
class Expr a
data Const = Const Double
data Add a b = Add a b
instance Expr Const
instance (Expr a, Expr b) => Expr (Add a b)
Notice that I used the names Const
and Add
both to name data types and their constructors. These two namespaces are separate, so if you’re running out of names, it’s a common practice to reuse them this way. Also, notice the use of two class constraints in the instance definition for Add
. Both a
and b
must be Expr
for Add a b
to be Expr
.
Now we would like to define the function evaluate
for both Const
and Add
nodes. But these are no longer just two constructors — they are two different data types. The only way to define evaluate
for both is to overload it. This is possible, but only if evaluate
is part of a typeclass. The name Valuable
for such a class seems natural. Let’s make it work for Const
and Add
:
class (Expr e) => Valuable e where
evaluate :: e -> Double
instance Valuable Const where
evaluate (Const x) = x
instance (Valuable a, Valuable b) => Valuable (Add a b) where
evaluate (Add lft rgt) = evaluate lft + evaluate rgt
This time we made sure that only expressions can be evaluated: That’s the constraint Expr e
in the definition of Valuable
.
So that’s our library. Let’s now see how a client may extend it by, for instance, adding a new expression type: Mul
.
data Mul a b = Mul a b
instance (Expr a, Expr b) => Expr (Mul a b)
instance (Valuable a, Valuable b) => Valuable (Mul a b) where
evaluate (Mul lft rgt) = evaluate lft * evaluate rgt
Plugging Mul
into the library required only making it the instance of Expr
and Valuable
. Perfect! Our library is now open to adding new data.
Let’s check if it’s still open to new functions. Let’s add a pretty printing function to expressions, including the newly defined Mul
. The trick is to make pretty
a member of a new class Pretty
and then make all the expression types its instances:
class (Expr e) => Pretty e where
pretty :: e -> String
instance Pretty Const where
pretty (Const x) = show x
instance (Pretty a, Pretty b) => Pretty (Add a b) where
pretty (Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
instance (Pretty a, Pretty b) => Pretty (Mul a b) where
pretty (Mul x y) = pretty x ++ " * " ++ pretty y
Granted, there is a bit of syntactic noise here, but we have accomplished quite a feat: We have overcome the expression problem!
The Monad Typeclass
Let’s go back to the Monad
typeclass. This is how it’s defined in the Prelude:
class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b
return :: a -> m a
fail :: String -> m a
mv >> k = mv >>= _ -> k
fail s = error s
There’s something different about this typeclass: it’s not a typeclass that unifies types; rather it unifies type constructors. The parameter m
is a type constructor. It requires another type parameter to become a type. See how m
always acts on either a
or b
, which are type variables. Going back to the Evaluator
in our calculator, it was a type constructor too, with the type parameter a
:
newtype Evaluator a = Ev (Either String a)
That’s why we were able to make Evaluator
an instance of Monad
:
instance Monad Evaluator where
Specifically, notice that we used Evaluator
, not Evaluator a
in this definition. You always supply a type constructor to the instance definition of Monad
.
Monad
also defines operator >>
, which ignores the value from its first argument (see its default implementation). You can bind two monadic functions using >>
if you’re only calling the first one for its «side effects.» This will make more sense in the next tutorial, when we talk about the state monad. Operator >>
is often used with the IO
monad to sequence output functions, as in:
main :: IO ()
main = putStrLn "Hello " >> putStrLn "World"
which is the same as:
main :: IO ()
main = do
putStrLn "Hello "
putStrLn "World"
Exercises
Ex 1. Define the WhyNot
monad:
data WhyNot a = Nah | Sure a
deriving Show
instance Monad WhyNot where
...
safeRoot :: Double -> WhyNot Double
safeRoot x =
if x >= 0 then
return (sqrt x)
else
fail "Boo!"
test :: Double -> WhyNot Double
test x = do
y <- safeRoot x
z <- safeRoot (y - 4)
w <- safeRoot z
return w
main = do
print $ test 9
print $ test 400
Ex 2. Define a monad instance for Trace
(no need to override fail
). The idea is to create a trace of execution by sprinkling you code with calls to put
. The result of executing this code should look something like this:
["fact 3","fact 2","fact 1","fact 0"]
6
Hint: List concatenation is done using ++
(we’ve seen it used for string concatenation, because String
is just a list of Char
).
newtype Trace a = Trace ([String], a)
instance Monad Trace where
...
put :: Show a => String -> a -> Trace ()
put msg v = Trace ([msg ++ " " ++ show v], ())
fact :: Integer -> Trace Integer
fact n = do
put "fact" n
if n == 0
then return 1
else do
m <- fact (n - 1)
return (n * m)
main = let Trace (lst, m) = fact 3
in do
print lst
print m
Ex 3. Instead of deriving Show
, define explicit instances of the Show
typeclass for Operator
and Tree
such that expr
is displayed as:
x = (13.0 - 1.0) / y
It’s enough that you provide the implementation of the show
function in the instance declaration. This function should take an Operator
(or a Tree
) and return a string.
data Operator = Plus | Minus | Times | Div
data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
instance Show Operator where
show Plus = " + "
...
instance Show Tree where
show = undefined
expr = AssignNode "x" (ProdNode Div (SumNode Minus (NumNode 13) (NumNode 1)) (VarNode "y"))
main = print expr
Ex 4 This is an example that mimics elements of OO programming. Chess pieces are implemented as separate data types: here, for simplicity, just one, Pawn
. The constructor of Pawn
takes the Color
of the piece and its position on the board (0-7 in both dimensions). Pieces are instances of the class Piece
, which declares the following functions: color
, pos
, and moves
. The moves
function takes a piece and returns a list of possible future positions after one move (without regard to other pieces, but respecting the boundaries of the board). Define both the typeclass and the instance, so that the following program works:
data Color = White | Black
deriving (Show, Eq)
data Pawn = Pawn Color (Int, Int)
class Piece a where
...
instance Piece Pawn where
...
pieces = [Pawn White (3, 1), Pawn Black (4, 1), Pawn White (0, 7), Pawn Black (5, 0)]
main = print $ map moves pieces
The Symbolic Calculator So Far
Below is the source code for the current state of the project. Notice how lookUp
and addSymbol
nicely fit into the monadic scheme. I haven’t made any changes to the lexer and parser files.
{-# START_FILE Main.hs #-}
module Main where
import qualified Data.Map as M
import Lexer (tokenize)
import Parser (parse)
import Evaluator
main = do
loop (M.fromList [("pi", pi), ("e", exp 1.0)])
loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
Ev ev = evaluate tree symTab
in
case ev of
Left msg -> do
putStrLn $ "Error: " ++ msg
loop symTab -- use old symTab
Right (v, symTab') -> do
print v
loop symTab'
{-# START_FILE Lexer.hs #-}
-- show
module Lexer (Operator(..), Token(..), tokenize, lookAhead, accept) where
-- /show
import Data.Char
data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)
data Token = TokOp Operator
| TokAssign
| TokLParen
| TokRParen
| TokIdent String
| TokNum Double
| TokEnd
deriving (Show, Eq)
lookAhead :: [Token] -> Token
lookAhead [] = TokEnd
lookAhead (t:ts) = t
accept :: [Token] -> [Token]
accept [] = error "Nothing to accept"
accept (t:ts) = ts
tokenize :: String -> [Token]
tokenize [] = []
tokenize (c : cs)
| elem c "+-*/" = TokOp (operator c) : tokenize cs
| c == '=' = TokAssign : tokenize cs
| c == '(' = TokLParen : tokenize cs
| c == ')' = TokRParen : tokenize cs
| isDigit c = number c cs
| isAlpha c = identifier c cs
| isSpace c = tokenize cs
| otherwise = error $ "Cannot tokenize " ++ [c]
identifier :: Char -> String -> [Token]
identifier c cs = let (name, cs') = span isAlphaNum cs in
TokIdent (c:name) : tokenize cs'
number :: Char -> String -> [Token]
number c cs =
let (digs, cs') = span isDigit cs in
TokNum (read (c : digs)) : tokenize cs'
operator :: Char -> Operator
operator c | c == '+' = Plus
| c == '-' = Minus
| c == '*' = Times
| c == '/' = Div
{-# START_FILE Parser.hs #-}
-- show
module Parser (Tree(..), parse) where
-- /show
import Lexer
data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
deriving Show
parse :: [Token] -> Tree
parse toks = let (tree, toks') = expression toks
in
if null toks'
then tree
else error $ "Leftover tokens: " ++ show toks'
expression :: [Token] -> (Tree, [Token])
expression toks =
let (termTree, toks') = term toks
in
case lookAhead toks' of
(TokOp op) | elem op [Plus, Minus] ->
let (exTree, toks'') = expression (accept toks')
in (SumNode op termTree exTree, toks'')
TokAssign ->
case termTree of
VarNode str ->
let (exTree, toks'') = expression (accept toks')
in (AssignNode str exTree, toks'')
_ -> error "Only variables can be assigned to"
_ -> (termTree, toks')
term :: [Token] -> (Tree, [Token])
term toks =
let (facTree, toks') = factor toks
in
case lookAhead toks' of
(TokOp op) | elem op [Times, Div] ->
let (termTree, toks'') = term (accept toks')
in (ProdNode op facTree termTree, toks'')
_ -> (facTree, toks')
factor :: [Token] -> (Tree, [Token])
factor toks =
case lookAhead toks of
(TokNum x) -> (NumNode x, accept toks)
(TokIdent str) -> (VarNode str, accept toks)
(TokOp op) | elem op [Plus, Minus] ->
let (facTree, toks') = factor (accept toks)
in (UnaryNode op facTree, toks')
TokLParen ->
let (expTree, toks') = expression (accept toks)
in
if lookAhead toks' /= TokRParen
then error "Missing right parenthesis"
else (expTree, accept toks')
_ -> error $ "Parse error on token: " ++ show toks
{-# START_FILE Evaluator.hs #-}
module Evaluator (evaluate, Evaluator(..)) where
import Lexer
import Parser
import qualified Data.Map as M
newtype Evaluator a = Ev (Either String a)
instance Monad Evaluator where
(Ev ev) >>= k =
case ev of
Left msg -> Ev (Left msg)
Right v -> k v
return v = Ev (Right v)
fail msg = Ev (Left msg)
type SymTab = M.Map String Double
evaluate :: Tree -> SymTab -> Evaluator (Double, SymTab)
evaluate (SumNode op left right) symTab = do
(lft, symTab') <- evaluate left symTab
(rgt, symTab'') <- evaluate right symTab'
case op of
Plus -> return (lft + rgt, symTab'')
Minus -> return (lft - rgt, symTab'')
evaluate (ProdNode op left right) symTab = do
(lft, symTab') <- evaluate left symTab
(rgt, symTab'') <- evaluate right symTab'
case op of
Times -> return (lft * rgt, symTab)
Div -> return (lft / rgt, symTab)
evaluate (UnaryNode op tree) symTab = do
(x, symTab') <- evaluate tree symTab
case op of
Plus -> return ( x, symTab')
Minus -> return (-x, symTab')
evaluate (NumNode x) symTab = return (x, symTab)
evaluate (VarNode str) symTab = lookUp str symTab
evaluate (AssignNode str tree) symTab = do
(v, symTab') <- evaluate tree symTab
(_, symTab'') <- addSymbol str v symTab'
return (v, symTab'')
lookUp :: String -> SymTab -> Evaluator (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> return (v, symTab)
Nothing -> fail ("Undefined variable " ++ str)
addSymbol :: String -> Double -> SymTab -> Evaluator ((), SymTab)
addSymbol str val symTab =
let symTab' = M.insert str val symTab
in return ((), symTab')