Tuesday, April 3, 2012

Some type systems are much better than others.

As a shortcut, I previously complained about C and Java's type systems leaving bad impressions. Let's broaden that to the realm of type system research, avoiding useless discussion of the relative merits of this or that programming language as a whole. Some type systems let you express more of the set of valid programs, or different subsets. More out of the theoretical realm, but because we are human, different inference engines are better at figuring out types in different situations.

There is a nice graphic in Gödel, Escher, Bach that illustrates this issue: it displays the universe split into the true and the false, and elaborate structures covered with appendages illustrating “proof”. Of all valid programs¹, many are “typeable”, meaning that we can prove that they are sound, type-wise, with a particular type system; conversely, most invalid programs are not typeable.

However, you might imagine that the fingers stretch into the invalid space on occasion, and they don't cover the valid space entirely either; for any given type system, there are valid programs that are rejected, and there are invalid programs that are accepted.

Not so simple math

The goal of type system research is to both accept more valid programs and reject more invalid programs. Let's consider these three programs in three different languages, which implement one step of a divide-and-conquer list reduction strategy.
     def pm(x, y, z):                # Python
         return x + y - z
     
     pm x y z = x + y - z            -- Haskell
     -- inferred as Num a ⇒ a → a → a → a
     
     int pm(int x, int y, int z) {   /* C */
       return x + y - z;
     }
First, the expressive problem in C is easy to spot: you can only add and subtract ints. You have to reimplement this for each number type, even if the code stays the same.

The Haskell expressive problem is a little more obscure, but more obvious given the inferred type: all the numbers must be of the same type. You can see this with some partial application:
     pm (1::Int)      -- Int → Int → Int
     pm (1::Integer)  -- Integer → Integer → Integer
     pm (1::Float)    -- Float → Float → Float
The Python version works on a numeric tower: once you introduce a float, it sticks. This may be good for you, or it may be bad. If you are reducing some data with pm and a float sneaks in there, you won't see it until you get the final output. So with dynamic promotion, pm works with everything, even some things you probably don't want.

There are adjustments you can make to get the Haskell version to be more general, but this all depends on the kind of generalization you mean. This is a matter of continued innovation, even in the standard library; many commonly used libraries provide alternative, more generic versions of built-in Prelude functions, such as fmap for functors, one of many, many generalizations that work with lists, in this case replacing map.


Footnotes

[1] Use whatever meaning you like for “valid”, but if you want something more formal, perhaps “it terminates” will suffice.

No comments: