Computer aided programming

by Dan Cutting

To me, test-driven development (TDD) sometimes feels like a game of chess between red and green pieces:

  1. Red’s turn: write a failing test.
  2. Green’s turn: counter the move by writing code that passes the test.
  3. Goto 1.

Just like chess, there are an overwhelming number of potential moves and games, and the same game will never be played twice. The alchemy is knowing what test to write, and what code to write that passes the test.

Many see this as innate or experiential.

Chess was once seen as a bulwark of human intelligence but over the decades, heuristics and algorithms have been developed that lead to strong positions. Although not “solved”, computers now regularly defeat even the greatest human players.

Maybe writing tests and code is not such a black art either.

Uncle Bob’s Transformation Priority Premise (TPP) goes some way to unravelling the magic. The TPP is a catalog of transformations that can be applied to code to move it from a state that fails a test to one that passes. It attempts to order the transformations from simplest to most complex and suggests we apply the simplest possible one at any given time.

As an example, let’s write a function that echoes a string. Here’s our first test:

require 'test/unit'

class TestEcho < Test::Unit::TestCase
  def test_echo_hello
    assert_equal("hello", echo("hello"))

We start by simply returning a constant which satisfies our test:

def echo(s)
  return "hello"

Let’s add another case:

class TestEcho < Test::Unit::TestCase
  def test_echo_helloworld
    assert_equal("hello world", echo("hello world"))

We can now transform our code using the constant → scalar rule (where scalar in this context refers to a variable or argument):

def echo(s)
  return s

The gist of the TPP is that we can build up an algorithm in small steps using simple tests and simple transformations. There is something intriguing at play here: some sort of guided, emergent complexity. If you’re not familiar with the TPP, I highly recommend reading Uncle Bob’s original blog post which has some in-depth examples and discussion.

Towards the end of the post Uncle Bob suggests some future directions including tool support for suggesting and applying transformations. Indeed, this feels like an algorithm we could automate:

  1. Write a failing test.
  2. Computer applies a transformation which passes the test.
  3. Goto 1.

Of course, the hard part is designing an algorithm that can suggest transformations based on an existing body of code and a failing test. Selecting simple transformations over complex ones is a good starting point, but it’s easy to imagine such a tool producing a pathological series of if statements that mirror the tests:

def echo(s)
  if s == "hello"
  elsif s == "hello world"
    "hello world"

In fact, this is also an issue for newcomers to TDD. Once again, Uncle Bob has an answer: “As the tests get more specific, the code gets more generic.”

Although this is a good heuristic, and perhaps intuitive enough for human programmers, such a rule is difficult to implement in an automated tool. How can an algorithm analyse a piece of code to determine if it is generic or specific?

We need some sort of computable metric.

Micah Martin has suggested the Absolute Priority Premise as a way of scoring code (video part 1 and part 2). In particular, he describes a Code Mass Metric (CMM) which assigns scores to all the parts of a piece of code such as if statements, variable assignments and so on. Summing them up gives an indication of how “massive” the code is.

In essence, the CMM says some statements, like recursive function calls, are “more massive” than other statements, such as using a constant. Intuitively this favours simple code over complex code. Crucially, however, a summation of “simple” code such as our pathological accumulation of if statements, would be more massive than a single, moderately massive statement that returns a scalar, and the latter would therefore be considered better.

So perhaps our automated tool could use the CMM to lead us down the path that increases the code mass by the least amount after we write each test.

As we’re stumbling through this massive search space of tests, code and transformations, we’re likely to end up in a local minimum, where the final product is functional but far from optimal. Perhaps some of our tests required us to use expensive transformations when there may have been other simpler routes.

TDD already has a guideline for this problem. If the simplest transformation for any given step seems to require too many changes to the code, we should backtrack and write a different test that permits a simpler transformation.

This also feels like something we could automate.

Instead of looping through the semi-automatic process of writing a failing test, then applying a simple automated transformation, perhaps we could do the following:

  1. Write a suite of tests that cover “everything”.
  2. Automatically resolve these tests into code.

To elaborate: rather than entering tests one at a time, we provide a set of unit tests that describe all the scenarios we’re interested in. The tool then iterates through all permutations of these tests and consequent transformations to find the sequence that leads to the lowest possible code mass.

In principle, this is similar to how a chess algorithm tries all possible moves and evaluates the score for each terminal position. It’s likely a very big search space, much bigger even than the mind-boggling chess landscape. Well-worn search tree algorithms might be one way to approach this, but extremely sophisticated heuristics would probably also be necessary to make it tractable, if it’s even possible.

The application of the TPP, CMM and a search tree feels like a very “symbolic” approach to computer aided programming. Expert systems are another technology that use symbolic techniques to “learn” vast amounts of information that can then be used to make future decisions. (Interestingly, they are often represented as massive series of if statements.)

Evolutionary algorithms, and genetic programming have long been used to optimise specialised engineering and programming problems. Non-symbolic alternatives like neural networks are also capable of mapping inputs (tests?) to output (code that passes the tests?).

The problem with any of these approaches is that they tend to work best for continuous problems without discontinuities. Code is an inherently symbolic construct used to solve any computable problem, continuous or otherwise. Generated results may work correctly for all test inputs but in many cases produce unmaintainable code where tweaks are difficult. Non-symbolic approaches are often inscrutable to human programmers and impossible to adjust without retraining the system.

A TPP/CMM-driven approach to computer aided programming has the potential to produce clean, maintainable code, using much the same process that experienced programmers use today. But it does feel like we’ve managed to squash all the creative parts of coding out of the process, relegating the programmer to just declaring a suite of unit tests.

Maybe we should keep the fun bits to ourselves, and make the computer write tests for our code instead.