Mutagenesis - Technical Notes

As promised earlier, here are some notes about what's going on inside Mutagenesis. This is mainly for the curious -- you don't have to know about any of this to play the game, although having some idea of what's behind it may help you to make a bit more sense of the outcomes.

DNA and its Interpretation

The DNA of each plant is an L-system (Lindenmayer system), which is a
kind of grammar. It consists of a collection of productions, each of
which has a left hand side and a right hand side. Here's an example
of the DNA for a simple plant:

1:    A --> wwwwB
2:    B --> [A][A]
3:    B --> [L][L]A
4:    L --> {llwrwrw}

The structure of a plant is represented by a string of characters. In
my system, upper case letters are nonterminal symbols, and lower case
letters and digits are terminal symbols. Some other characters such as
brackets and braces are also used, with meanings that will be described
below.

The seed of a new plant is the string "A". At each growth step, every
nonterminal symbol is simultaneously replaced according to a production.
If more than one production applies, one of them is chosen at random
for each replacement.

The first few steps of growth for the above DNA might look like
this:

  A
  wwwwB
  wwww[A][A]
  wwww[wwwwB][wwwwB]
  wwww[wwww[L][L]A][wwww[A][A]]
  wwww[wwww[{llwrwrw}][{llwrwrw}]wwwwB][wwww[wwwwB][wwwwB]]
 
To turn the string into a graphical representation, a scheme based on
turtle graphics is used. The turtle starts off at the root of the plant,
facing upwards. Terminal characters in the string are then interpreted
as a sequence of commands for the turtle:

  w - move forward a short distance and draw a line
  l - turn left by a predetermined angle
  r - turn right by a predetermined angle

The 'w' command currently always draws a brown line, representing a piece
of stem ('w' stands for 'wood'), and the turning angle for 'l' and 'w'
is fixed at 10 degrees.

Square brackets are used to enclose branches. When a branch is encountered,
a new turtle is created with the same initial state as the current one,
turned to a new angle, and used to interpret the characters inside the
brackets. Interpretation of the main string then resumes using the original
turtle.

A series of branches without intervening commands are automatically
distributed at increasing angles either side of the main axis, currently
at fixed 20 degree increments.

Curly braces represent filled shapes. The commands inside the braces are
interpreted by a new turtle, and the points that it traces out are used as
the control points of a Bezier curve. This curve is then mirrored twice
(left-right and forward-backward) and the resulting shape filled in.

In the above example, a filled shape is used to represent a leaf.

So we can paraphrase the meaning of this DNA as follows: A plant begins by growing
a piece of stem. The tip of the stem can then turn into either a new pair of
stems branching to either side, or a pair of leaves and another stem continuing
straight ahead. Each new stem develops in the same way as the original plant.

This simple plant is rather boring, since it as leaves but no flowers or
fruit. The full DNA of the initial plant used in the game is more complex,
and uses some additional commands. It looks like this:

1:    A --> wwwwB
2:    B --> [A][A]
3:    B --> [L][L]A
4:    B --> [C][C]
5:    C --> wwwwD
6:    D --> [C][C]
7:    D --> [L][L]C
8:    D --> E
9:    E --> [G][C]
10:   E --> [C][G]
11:   G --> wO
12:   G --> wwF
13:   L --> {7cllwrwrw}
14:   O --> 1c5do
15:   F --> 5b[P]S
16:   P --> {6clllwrw}
17:   S --> 3c6do

The new commands are:

    1-9     Establishes a parameter value. Which parameter is set depends on the
            next parameter-setting command encountered, as described below.
         
    c       Sets the colour used for filling shapes.
   
    d       Sets the diameter of circles.
   
    b       Sets the branch multiple (see below).
   
    o       Draws a filled circle with colour and diameter determined by the most
            recent 'c' and 'd' commands.

Parameter values corresond to colours as follows:

    1  red 
    2  orange 
    3  yellow 
    4  light green 
    5  light blue 
    6  white 
    7  dark green 
    8  dark blue 
    9  violet 

There are two examples of the 'o' (circle) command here. Production 14 uses it for
the centre of a flower, and production 17 uses it for a berry.

There are also two filled shapes. Production 13 is the leaf we saw earlier, with the
addition of a colour-setting command. Production 16 uses a similar but slightly
different shape with a different colour for a flower petal.

Production 15 generates a flower using another new feature: branch multiplication.
If a branch is encountered while a branch multiplier is in effect (having been set
by a previous 'b' command) the branch is repeated that many times, with the copies
distributed evenly around a full 360 degrees. Generating the flowers this way,
instead of using a separate branch for each petal, allows the number of petals and
the shape of the petals to mutate without disturbing the symmetry of the flower.

As well as the flowers and berries, there is another difference from the first
example: there are two nearly-identical copies of the first few productions. The
reason for this is to ensure that the plant grows for a couple of steps before
starting to produce flowers and berries, producing a more life-like distribution
of plant features.

Mutation

Mutating the DNA involves making a series of random decisions. The first is
whether to add a new production or change an existing production.

If a new production is being added, it is created by duplicating one of the
existing productions. Then there is a chance of replacing its left hand side
with a different nonterminal, either a new one or one already appearing
somewhere in the existing productions. If a new nonterminal is being added,
it is also inserted at a randomly chosen place in the right hand side of one
of the existing productions.

If an existing production is being mutated, either the left or right hand
side can be changed. The left hand side is mutated in a similar way to that
of a new production.

To mutate the right hand side of a production, a point in it is randomly
chosen. What happens next depends on the symbol found there:

- If it is '[' (the beginning of a branch), the whole branch may be
  duplicated or deleted.
 
- If it is ']' (the end of a branch), either a random nonterminal or a new
  branch containing a random nonterminal may be inserted after it.

- If it is a nonterminal, it may be replaced by another nonterminal or
  terminal, or deleted.

- If it is a terminal, it may be replaced by another terminal, a new
  terminal may be inserted before it, or it may be deleted. If it is a
  digit, the most likely mutation is to replace it by a different digit.

It's currently assumed that a separate production is used for each filled
shape, so that '{' and '}' only appear surrounding the whole right hand
side of a production, and that only terminal symbols occur inside the
braces. The initial DNA observes these constraints, and the mutation rules
preserve them.

The special treatment of nonterminal symbols is an attempt to prevent
unnatural-looking growth patterns from arising. In most plants, apart from
thickening of branches, growth only occurs at the tips. Only inserting
nonterminals at the ends of branches, and never after an existing
nonterminal, ensures that mutations can't cause growth to occur in the
middle of a branch. It also helps to limit the growth rate, so that the
number of symbols in the plant's structure doesn't explode too rapidly
at each step.

Cross-breeding

To cross-breed two sets of DNA, their lists of productions are laid out
side by side, a crossing-over procedure is applied to each pair, and the
result becomes the production list of the new plant. If one parent has
more productions than the other, the unpaired productions are appended
unchanged.

Crossing over a pair of productions involves choosing one of the left
hand sides at random, and crossing over the right hand side strings.

To cross over the right hand sides, they are first classified as either
filled shapes (surrounded by '{' and '}') or not. If they are both
filled shapes, their contents are crossed over and the result is a new
filled shape. If one is a filled shape and the other not, one of them is
chosen at random. If neither are filled shapes, they are crossed over as
described below.

To cross over a pair of strings, they are first divided into segments,
where each segment is either a complete branch (possibly containing
sub-branches) or a non-branch sequence of terminals. The segments are
then paired up.

If both members of a pair are branches, the contents of the two branches are
recursively crossed over. If a pair consists of a branch and a non-branch, one
of them is chosen at random.

If both members of a pair are non-branches, there is a chance that they
will be divided in two at randomly-chosen points and one piece chosen
from each half of each pair. The longer the segments, the greater the chance
that this will happen. (Long pieces of DNA are fragile and break easily!)

If there are more segments in one string than the other, the unpaired
segments are appended unchanged to the new string.

Further Reading

The Algorithmic Beauty of Plants, by Przemyslaw Prusinkiewicz and Aristid
Lindenmayer, available online at http://algorithmicbotany.org/papers/#abop

Most of my ideas for interpreting the L-systems using turtle graphics came
from here. The mutation and cross-breeding stuff I just made up as I went
along, though, so don't blame them for it.