Preferences

I like using GADTs in Haskell. However, I find the explanation in the article nearly impossible to understand. This annoys me, so I will attempt to give my own explanation of what GADTs are. (Because everyone else explains it incorrectly and my explanation is of course the best, right? /s)

So, a bit of Haskell first, starting with algebraic data types. (Skip this if you already know about ADTs.) If you want to define a data type in most languages, there are basically two things you can do. Either you can define a record, with multiple values wrapped up as a single type:

    data MyRecord = ConstructorName Int String Bool
Or, you can define an enumeration, where a type has a limited set of allowed values:

    data TrafficLight = Red | Amber | Green
Haskell calls the former ‘product types’ and the latter ‘sum types’.

The thing is, there’s nothing stopping you from combining the two: making an enumeration where each value of the enumeration is a record. This is called an algebraic data type (because it combines sums and products, geddit?) For instance, this lets me make an AST for a simple language:

    data AST
        = Integer Int
        | Boolean Bool
        | Addition AST AST
        | Multiplication AST AST
        | Equals AST AST
        | Not AST
So now I can make values like `Not (Equals (Integer 1) (Integer 2)) :: AST`. (In Haskell, read ‘::’ as ‘has-type’.)

Often, you want to make data types polymorphic over the record field type. For instance, I might have a type expressing either an integer or an error:

    data IntErr = Error String | IntResult Int
And I might have expressing either a string or an error:

    data StrErr = Error String | StrResult Int
And eventually I might get tired of writing out all these types. In that case I can add a type variable representing any type:

    data WithError a = Error String | Result a
So now I can have `Result 3 :: WithError Int`, `Result True :: WithError Bool`, and so on. (This is basically the same as generics or templates in languages which have them.)

The thing is, no-one says a type parameter needs to correspond to anything concrete. There’s nothing in Haskell which stops me from doing this:

    data Weird a b = IgnoreTheType Int
This is a bit weird. If you have a value of type `Weird String Bool`, say, that value won’t actually have a string or a boolean within it. And `IgnoreTheType 1 :: Weird String Bool`, `IgnoreTheType 2 :: Weird Char Float`, `IgnoreTheType 3 :: Weird Bool Double` are all values of different types. You can basically change the type at will without affecting the value. Variables such as `a` and `b` are called ‘phantom type variables’.

Now let’s back up a bit and have another look at that `AST` data type I defined earlier. It’s not particularly satisfying: you can easily create meaningless terms like `Addition (Boolean True) (Not (Integer 3))`. As Haskellers, we want to avoid this as much as possible. So let’s add a phantom type parameter to express the type returned by each `AST` constructor:

    data AST r
        = Integer Int
        | Boolean Bool
        | Addition (AST Int) (AST Int)
        | Multiplication (AST Int) (AST Int)
        | Equals (AST r) (AST r)
        | Not (AST Bool)
This says that multiplication, for instance, must take two `AST`s returning an `Int`, as encoded in the type parameter. But wait — how do we restrict the type parameter? We need to be able to say that something constructed with `Multiplication` must have type `AST Int`, but something constructed with `Not` must have type `AST Bool`. As it happens, vanilla Haskell has no way of doing this. There is no way of explicitly controlling the value of `r` based on which constructor is used.

You might have guessed by now that this is exactly what generalised algebraic data types let us do. And indeed, `AST` can be easily implemented as a GADT:

    data AST r where
        Integer        :: Int -> AST Int
        Boolean        :: Bool -> AST Bool
        Addition       :: AST Int -> AST Int -> AST Int
        Multiplication :: AST Int -> AST Int
        Equals         :: AST r -> AST r -> AST Bool
        Not            :: AST Bool -> AST Bool
And now, if we try to construct an incorrect AST, the compiler will give us back an error, because the types don’t match:

    > Addition (Boolean True) (Not (Integer 3))
    
    <interactive>:11:11: error:
        * Couldn't match type `Bool' with `Int'
          Expected type: AST Int
            Actual type: AST Bool
        * In the first argument of `Addition', namely `(Boolean True)'
          In the expression: Addition (Boolean True) (Not (Integer 3))
          In an equation for `it':
              it = Addition (Boolean True) (Not (Integer 3))
    
    <interactive>:11:26: error:
        * Couldn't match type `Bool' with `Int'
          Expected type: AST Int
            Actual type: AST Bool
        * In the second argument of `Addition', namely `(Not (Integer 3))'
          In the expression: Addition (Boolean True) (Not (Integer 3))
          In an equation for `it':
              it = Addition (Boolean True) (Not (Integer 3))
    
    <interactive>:11:31: error:
        * Couldn't match type `Int' with `Bool'
          Expected type: AST Bool
            Actual type: AST Int
        * In the first argument of `Not', namely `(Integer 3)'
          In the second argument of `Addition', namely `(Not (Integer 3))'
          In the expression: Addition (Boolean True) (Not (Integer 3))

One thing that I'd like to add is that using the typed GADT encoding of the AST lets you write a total eval function such as eval :: AST r → r, no need to return a Maybe or Either! In the case without GADTs you couldn't return a boolean in one case and an Int in another. Here you can.
You know, I did very nearly mention this, but decided not to in the end. You are of course entirely right, and this is indeed a major reason why I use GADTs myself.
Thanks for the amazing explanation.

I tried to implement this in C++ where, as usual, everything is possible but nothing is practical:

https://gcc.godbolt.org/z/9WGxcPWGd

Helpful explanation. I think a lot of the trouble for me in understanding GADTs is the Haskell syntax used in a lot of examples.

My static typing experience is limited to TypeScript and Rust, so I'm curious what a GADT would look like if they were added to those languages.

Here's all the code from the parent in Rusty pseudosyntax:

    enum MyRecord {
      ConstructorName(Int, String, Bool)
    }

    enum TrafficLight {
      Red,
      Yellow,
      Green
    }

    enum AST {
      Integer(Int),
      Boolean(Bool),
      Addition(Ast, Ast),
      Multiplication(Ast, Ast),
      Equals(Ast, Ast),
      Not(Ast)
    }
`Not(Equals(Integer(1), Integer(2))) : AST`

    enum IntErr {
      Error(String),
      IntResult(Int)
    }

    enum StringErr {
      Error(String),
      StringResult(String)
    }

    enum WithError<A> {
      Error(String),
      Result(A)
    }
`Result(3) : WithError<Int>` `Result(True) : WithError<Bool>`

    enum Weird<A, B> {
      IgnoreTheType(Int),
    }
`Weird<String, Bool>` `IgnoreTheType(1) : Weird<String, Bool>` `IgnoreTheType(2) : Weird<Char, Float>` `IgnoreTheType(3) : Weird<Bool, Double>`

`Addition(Boolean(True), Not(Integer(3)))`

    enum AST<R> {
      Integer(Int),
      Boolean(Bool),
      Addition(Ast<Int>, Ast<Int>),
      Multiplication(Ast<Int>, Ast<Int>),
      Equals(Ast<R>, Ast<R>),
      Not(Ast<Bool>)
    }
Finally, we get our GADT:

    enum AST<R> {
      Integer(Int) : Ast<Int>,
      Boolean(Bool) : Ast<Bool>,
      Addition(Ast<Int>, Ast<Int>) : Ast<Int>,
      Multiplication(Ast<Int>, Ast<Int>) : Ast<Int>,
      Equals(Ast<R>, Ast<R>) : Ast<Bool>,
      Not(Ast<Bool>) : Ast<Bool>
    }
Thanks lalaithion! I don’t know Rust, so it’s interesting to see how different it is compared to Haskell.

A question: how much of this would actually compile in Rust?

- Assuming `Int` and `Bool` are user-defined types (Rust has `bool` and many types of ints like `usize`, `i64`, etc. instead). If you want to work over ANY primitive integer you can use the `num_traits` crate and make your type generic over `num_traits::PrimInt`, or even `num_integer::Integer` to include support for crates such as `num_bigint`.

- Rust needs indirection for recursive data types (like the AST) using `Box`, references, or another kind of indirection. Otherwise, the size isn't known at compile time (since it's potentially infinite).

- Rust doesn't have GADTs (yet?) so the last AST is purely theoretical.

- `IgnoreTheType` (and `AST<R>`) would require explicit use of `PhantomData` for variants without `R` https://doc.rust-lang.org/std/marker/struct.PhantomData.html

On Rust's playground: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Thank you! So it looks like most of my post does indeed work, except for the GADT itself.

(Also, I had not known about Rust Playground, so thanks for introducing me to that too.)

I really like your explanation, which I find easier to understand. I do appreciate your "real-life" AST example, making clear that your are making a better design through the typing system facilities.
Thank you, your explanation was _indeed_ better :-)

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal