On the Matter of Function Signatures

Programmers tend to click on function names to read the implementation to understand what a function actually does. Often software engineers consider this a main tool of their IDE. I lately come to think of that behaviour as a smell. I would like to enhance my functions to be able to tell everything from just looking at the (type) signature.

tl;dr: Total, pure and well typed functions lead to function signatures that tell the user everything she needs to know.

Bold statement. Let me elaborate on that.

On Functions

Mathematically a function is a rule that maps entries from one set onto another set. We actually know that from basic math: The function inc = x + 1 maps all numbers x to their respective successors. The signature for an implementation of that function would look something like that:

inc :: Integer -> Integer inc x = x + 1 {- Haskell Throughout this post I will use Haskell as my language of choise. If you have never seen Haskell code before, the following prerequesits might be useful: 1. Function signatures look like this: 'fun :: Int -> Int -> Int' Where 'fun' is the function name and types follow after the '::'. Types are read from left to right and map to the arguments of the function. The last one is the return type of the function. Example: add :: Int -> Int -> Int add x y = x + y means that x is of type Int, y is of type Int and the return value is also of type Int. 2. There's a difference between Int and Integer in the boundaries of both types. Int is an instance of the typeclass Bounded and has a max and a min value (dependent on your processor architecture); whereas Integer has no max or min value. 3. A function name can contain single quotes, e.g. dvd' is another function than dvd. It often is used by Haskell developers in a very mathematical way to show that it is some kind of derivation of the original function. -}

We expect an integer output for every integer put in. We do not expect anything else from that function and would be utterly surprised getting a string back or something completely different. We want to trust that implementation. Have another look at simple math:

add :: Int -> Int -> Int sbt :: Int -> Int -> Int -- subtract mlt :: Int -> Int -> Int -- multiply dvd :: Int -> Int -> Int -- divide

From the signatures we read that every pair of integer input will return an integer output for every operation.

We can guess that dvd will actually give us the quotient and drop the remainder. Easy. But there is one other problem with dvd: It simply lies about its mapping rule.

We know, that the division by 0 is not defined. In math we would respond with undefined or not permitted. But that very response is a string rather than an integer and we are surprised. Maybe it’s not a string that is returned but an error message thrown through an exception that puts the whole program to halt. We would be stunned in surprisal for the sheer amount of effect. And we should be! The signature is lying to us!

What happens in the dvd function? Let's look at the internals:

dvd :: Int -> Int -> Int dvd x 0 = error “Division by 0 is not defined." dvd x y = x `div` y

It’s easy to see that division by zero throws an exception. But we had to look up the implementation. And again, I would rather not have to do that.

Total Functions

The implementation of dvd is called a partial function. Mathematically speaking we can only apply the function to a part of our set (Integer\{0}) for the second parameter.

A total function is a function that can be applied to the whole set. To get to a total function we have to reduce the set for the second parameter. We can do that easily by defining a new type:

newtype IntWithoutZero = IntWithoutZero Int -- implementation of (to|from)IntWithoutZero left out for simplicity

With that type in place we can redefine our dvd function:

dvd :: Int -> IntWithoutZero -> Int dvd x y = x `div` y

At first glance we see that the function got a whole lot easier. We actually don't need to look at the implementation anymore to understand the function in its whole.

Side note: I use the implementation of dvd here that mimics div to make my point with a simple example; but note that there is an ongoing discussion about the partial function div  to be valuable for compiler optimisation. Since my point is to have total knowledge of function implementation through function signature I consider all partial functions of non-valuable for that purpose.

Pure Functions

Let’s go back to our inc function for a moment and have a look at an alternative possible implementation:

-- Pseudo code inc x = do putStrLen “incrementing!” return x + 1

I had to use pseudo code here since Haskell enforces purity and therefore the side effect wouldn't be possible. But bare with me for a second and try to grasp the underlying problem: The function signature is lying again about what is actual happening. We give it an integer value and get an integer value back. That does work for all integer values in the bounds of Int and therefore fulfils the requirement of totality.

But it is impure.

A function is considered pure if it has no effects. It just provides a (input dependent) value. A pure function can always be written as a table. Consider a pure add function as a table:

(+) 0 1 2 3 4 5 … 0 0 1 2 3 4 5 … 1 1 2 3 4 5 6 … 2 2 3 4 … …

There’s no possibility for that table doing anything else than just adding two numbers.

With pure functions we have to think less about the implementation because we have to think less about implications of application. What matters is the function signature. We are able to trust that even more.

Container Types

Let’s go back to our dvd example for a moment. There is another possible solution to that problem: Application of container types to enhance the set of possible return values instead of reducing the set of possible inputs. To introduce another simple example, we could apply the Maybe type to tell the user that she might not get an Int value back:

dvd’ :: Int -> Int -> Maybe Int dvd’ _ 0 = Nothing dvd’ x y = Just (x `div` y)

We could also have used Either or may have defined some type IntOrUndefined to make the possibility of failure explicit.

Either way; we get a function signature that we can trust telling us what may happen. No more surprises here.

Capstone: It's all about Communication

Having solid function signatures is a matter of communication and precision. Creating a function I want to express my intention explicitly. And on the other hand I prefer knowing what a function does by just looking at its signature. A statically typed language like Haskell or Java can help with that by providing very precise type signatures.

I would love seeing more developers picking up trust in the informations they get just looking at function signatures in the future and stop looking at the implementation details.