March 2006


First, apologies for the late appearance of this issue. I had problems with my laptop's Internet connection, which meant that I ended up having to remotely edit a backup copy of the Newsletter held on my Web server machine.

Anyway, over the past month, I've been looking into holes. Actually, I started by learning some immunology, because I'd intended to write about Artificial Immune Systems.

Our immune systems learn. They sometimes learn extremely effectively, as when they protect us from reinfection by childhood diseases such as measles and mumps. Because of this, some researchers have tried emulating the immune system to find out whether immunological principles can help us solve computational problems. The most obvious application is detecting computer-system intruders such as spam, viruses, and malicious network connections. But this is not the only possibility; indeed, one theory of immunology, so-called immune network theory, ought to interest those working on dynamical models of cognition.

While reading about immunology and Artificial Immune Systems, I came across the topics of negative selection and negative databases. These looked intriguing, and I decided to investigate them in more detail. So that is what I have ended up writing about.

There is a lot of other work on Artificial Immune Systems, some of which you'll be able to find from my links. I hope, in a future issue, to write a more general feature about the subject. Let me go on then to negative representations of information.

Immunology and Negative Representations of Information

I'm going to talk about two very different applications of negative representations of data. The first is the detection of intruders and anomalous behaviour generally. The second is about ensuring privacy of databases. This unusual idea entails representing a database by storing all the records it doesn't contain: a "negative database". Suitably constructed, the result is provably hard to re-invert. You can, therefore, give away a negative database on which a suitable query program will be able to perform certain kinds of search, yet ensure that the database as a whole cannot be decoded.

Despite these different objectives, both applications share intellectual history, originating in the study of Artificial Immune Systems. Negative databases were invented by Fernando Esponda only a year or three ago. In his dissertation, Negative Representations of Information, he states that the source of his inspiration was negative selection in the immune system. So let's look at that next.

Negative detection and selection

Negative selection is closely connected with the "self versus non-self" theory of immunology. Introduced by Macfarlane Burnet in the 1950s, this is the theory that the immune system recognises pathogens - disease-causing organisms such as viruses and bacteria - by detecting molecules present in them ("non-self") but not in our own selves. How does it do this? Here's Burnet's answer, from his 1960 Nobel Lecture:

How can an immunized animal recognize the difference between an injected material like insulin or serum albumin from another species and its own corresponding substance?

Clearly this is a problem of information. It is conceivable that a substance could be recognized as foreign if it was built up of chemical configurations insusceptible to enzymic breakdown by the available mechanisms of the animal involved. This may have some relevance to micro-organismal antigens but not to the substances of vertebrate origin that are our present concern. Their recognition in the sense in which we are using the word, requires that there be available in the body a large volume of accessible "information" with some superficial analogies to a dictionary. In other words, there must be something against which a configuration can be compared and a decision made whether it corresponds or not. We find somewhere a combination of letters RAXE and we use an English dictionary to find that there is no such word in English. If the body is to differentiate between self and not-self configurations, the only general form of a solution that has so far been thought of requires the presence of a complete set of complementary steric patterns in some accessible form which correspond to

  • either (a) all configurations not present in body components,
  • or (b) all configurations present in body components,
  • or (c) all configurations but in two categories corresponding to (a) and (b) above.

Of these alternatives the first is obviously the most attractive as providing a positive recognition of any configuration against which reaction will be necessary. It is the only one which will be elaborated here - none of the others have been seriously considered by anyone. I should agree with Jerne that the information needed may be compared to a "purged xenotypic dictionary".

A purged xenotypic dictionary

I don't believe Burnet's resounding phrase is used any more, but it's striking, and it expresses well what, at an abstract level, is going on. The idea is that the immune system deploys a vast number of detector cells which it has generated at random. Arrayed around their surface, these cells have detector molecules: "receptors". All receptors on any one cell have the same shape; but different cells or groups of cell have receptors of different shape. The randomly-generated repertoire of detector cells therefore contains an equally varied repertoire of receptor molecules.

Shape is important, because it's how the immune system implements pattern-matching. Immunologists talk about "antigens": substances capable of stimulating the immune system to produce antibodies. If a receptor has a shape complementary to part of an antigen, receptor and antigen will bind, setting off a cascade of cellular events which will, we hope, eventually eliminate whatever type of organism was carrying the antigen. The analogy often used is of a key fitting into a lock, or a hand into a glove.

Given that the immune system generates detectors at random, and that therefore some might by chance match molecules belonging to our own "self", how does it prevent these self-matchers from causing trouble? The answer is that it eliminates them. The detectors go through a training process in which they are exposed to a selection of "self" molecules; those detectors that match them are promptly destroyed. The immune system therefore ends up with a repertoire of detectors that match randomly generated instances of non-self. It's this we want to emulate on the computer.

Uses of negative selection

What might we use computerised negative selection for? The most compelling analogy is with intruders that, like pathogens, we wish to destroy; it's therefore not surprising that there has been a fair amount of Artificial Immune Systems research into computer security. But, as I mentioned earlier, we might want to detect other types of anomalous behaviour, and some of my links are about doing just that.

In some cases, we shall want to keep up with a "self" that changes over time. This is something that probably doesn't happen much in our bodies, but can happen in data processing when chasing slowly varying signals, or monitoring computer networks to which new hardware is frequently added.

This connects up with the idea that we may have no explicit definition of non-self, so that the only way to learn one is by accumulating sample instances of it. In other words, negative selection is really about the instance-based learning of non-self. For an animation which well illustrates this, try Peter Worth's selection applet. You can also download its Java source code.

Jason Brownlee, in the Negative Selection - Modelling in the complement space page of his extremely useful blog on Artificial Immune Systems, discusses this, and notes that these ideas are typically applied to domains with the following characteristics:

  • They are online, and the system must specialise over time to the patterns in the data stream.
  • The volume of good patterns out weighs the volume of bad patterns in terms of observed data.
  • The observed distinct patterns that are valid outnumber the number of distinct patterns that are invalid.
  • The size of the distnct valid pattern space is small in comparison to the size of its complement.

Rapid evolutionary learning in immunology: clonal selection, clonal expansion, somatic hypermutation and affinity maturation

Much Artificial Immune Systems research combines negative selection with "clonal selection", "clonal expansion", "somatic hypermutation" and "affinity maturation". In biological immune systems, when a non-self detector matches an antigen, it is stimulated to reproduce; as it does so, the immune system mutates the receptors on the resulting offspring. These new, mutated, detectors compete to bind to antigens; those that bind most strongly survive and reproduce further. The end result is a set of detectors that match the antigen very closely, generated by an extremely rapid form of evolutionary reinforcement learning. The start of this process, selection of the original detector for reproduction, is clonal selection. Clonal expansion refers to the entire process of generating these variants; affinity maturation to the reinforcement learning; and somatic hypermutation to mutating the receptors.

It's not surprising that AIS researchers have implemented learning algorithms based on this process. One popular algorithm is is CLONALG: in Improved Pattern Recognition with Artificial Clonal Selection?, Jennifer White and Simon Garrett explain its workings and explore how it performs as a pattern recogniser. You can try it yourself by running Peter Worth's Clonal Selection applet.

Some general immunological points; or, the leukocyte and Raquel Welch

Before giving an example of computerised negative selection, I should say a bit about our understanding of immunology. In the first place, it's interesting that in his lecture, Burnet suggests negative selection may have originated with the need to detect and destroy, not pathogens, but malfunctioning self cells such as tumour cells. I presume that immunology has taken the idea further since then; and there are certainly analogues in AIS research. These include Bradley and Tyrell's Immunotronics - Novel Finite-State-Machine Architectures With Built-In Self-Test Using Self-Nonself Differentiation, which demonstrates this approach by creating an immunised decade counter that can detect faults in real time. Related to this is Canham and Tyrell's A Multilayered Immune System for Hardware Fault Tolerance within an Embryonic Array. This is part of the POEtic project on fault-tolerant biologically-inspired hardware.

In the second place, I should add that by "immune system", I mean the mammalian immune system. Other classes of animal and plant also have immune systems, but these don't work in exactly the same way as the mammalian, and as far as I know, are not so versatile.

Furthermore, immunologists distinguish between the "adaptive" immune system, which learns responses against pathogens, and the "innate", whose responses we are born with. An example of the latter is the macrophage, a type of white blood cell. To some, macrophages are best-known for trying to engulf Raquel Welch. Admittedly, she was in a submarine at the time: a micro-miniaturised blood-vessel-cruising sub in the film Fantastic Voyage. At this point, there may be some readers contemplating reincarnation as a macrophage. However, macrophages normally engulf and destroy certain types of bacteria. They have evolved this behaviour over eons as a response to certain molecules carried by the bacterium, and it can't be relearnt. The innate and adaptive immune systems are very closely linked, and though most AIS work has concentrated on the adaptive, some of the more recent research emulates features of the innate too, as extra protection.

My third point is that I have described only one aspect of the immune system, and I have done so at a very abstract level to show quickly its connection with computing. My links point at some Web-based resources; but if you want to experiment with AIS, there's no substitute for a good immunology textbook; recent, with clear diagrams. Copious supplies of scrap paper are also recommended.

Allied to this is the fact that we understand much more about the immune system and its cell-level mechanisms than when Burnet received his Nobel Prize. With this have come a number of competing theories and subtheories. For example, self versus non-self is often held not to be the complete picture. The body doesn't try to eliminate the bacterial flora living in our guts, for example, even though these are non-self.

Of these theories, I find "immune network theory" particularly inspiring, because it shows how memories (about which antibodies to deploy against a particular antigen) can be stored in the dynamics of a network of interactions between cells. See for example The New Immunology, Chapter 2 from Ben Goertzel's 1993 book The Evolving Mind. Such work ought to interest those working on dynamical models of cognition.

This variety of theories is not surprising. The immune system is intricately and bizarrely complicated, involving a plethora of chemical signalling pathways, innate and learnt responses, and threat-detection mechanisms. There is even a kind of dual-key "co-stimulation" system for threat confirmation. Some idea of this complexity can be gained by counting the pages of, and the size of the colour-coded cell-type keys in, any good immunology textbook. As the latter will demonstrate, the number of career options open to an immature immune-system cell would put those available to an Oxford graduate to shame.

An example of negative selection

It must be admitted that many of the immune system's cellular career options terminate in death. This brings me to an example of negative selection in AIS. I'll take this from A Change-Detection Algorithm Inspired by the Immune System, by Stephanie Forrest, Alan Perelson, Lawrence Allen, and Rajesh Cherukuri. Including a simple mathematical analysis of the effectiveness of negative detection, plus experiments to validate this, it's a good starting point.

Here's the example. Let's suppose that we have a computer prone to virus infection, and that there's a crucial file on it that must be protected. We first split the file into a collection of equal-sized substrings. Call this S, for "self".

Thus, if our file were the text of "Mary had a little lamb", and we split it into substrings of length 5, S would become:

{ "had a", "Mary ", "erywh", "o go.", " litt", "le la", etc. }

The authors did not remove duplicate substrings, making S a bag or multiset rather than a set. They don't record the order of the substrings, but say that for sufficiently long substrings, this loss of order information is not a problem.

We next generate another collection of strings, our detectors or receptors. Call this R. It's created by generating random strings from the same character set (the same alphabet, formally speaking) as that of the file to be protected. But we match each string against S, and delete it if it matches. That's the negative selection.

For strings of any interesting size, it would be very rare for two to be exactly equal. So the authors use a weak matching rule which they call "r-contiguous symbols". Two strings match if they agree in r contiguous character positions. Thus the two strings below match (at dcb) when r ≤ 3:


Later research has experimented with other rules.

With R generated, we are ready. We continually monitor the file under surveillance by splitting it into substrings as we did when constructing S. Match each of these substrings against all strings in R. We know R won't match the original file, because when constructing R, we threw away all strings that did. So if any match is detected, something or someone has changed the file.

It is, the authors say with understatement, not immediately obvious that this is a good idea. Surely, for R to be a comprehensive representation of non-self, it would have to be unimaginably vaster than S? Indeed, this is where I started becoming sceptical.

To show the method is practicable, the authors derive formulae for how the probability of detecting a non-self string depends on the number of detectors and on other parameters such as r. Detection probability increases exponentially with the number of detectors, because of the law of multiplication of probabilities. But the probability that one detector will spot a non-self string is surprisingly high, thus keeping the number of detectors low. For example, in an experiment where the authors use negative selection to monitor a file that they infect with the boot-sector virus "Kilroy", as few as five detectors detected the virus every time.

A big advantage of detecting non-self rather than self, the authors say, is that sets of non-self detectors can be split between several detector sites. Stating it more generally, small sections of an object can be checked for change independently. You couldn't usefully split a set of self-detectors in this way.

For example, if we wanted to detect malicious TCP/IP connections in a local area network, each computer on the network could be equipped with a different set of non-self detectors. These would have been created as above: generate TCP/IP strings at random, and screen out those that are permitted.

This distributability property is crucial, say the authors, because it allows each copy of the algorithm to use a unique set of detectors. Having identical protection algorithms can be a major vulnerability in large networks of computers because an intrusion at one site implies that all sites are vulnerable.

However, there is a problem, which Brownlee explains in Negative Selection - Modelling in the complement space. He remarks that:

Approximation may be efficient, but when false negatives can mean that the network or host has been infiltrated, I question the use of modelling the complement alone. In fact, the efficacy of the system is only going to be as good as the random or neighbourhood guessing for new complement patterns, or as mature as the library of confirmed - observed complement patterns. It seems negligent not to use some kind of confidence measure that the pattern is not normal by using some additional model of what is normal.

I agree, and am therefore sceptical about negative selection used on its own for intrusion detection. But the authors of An Immunological Approach to Change Detection: Algorithms, Analysis and Implications, a follow-up to A Change-Detection Algorithm Inspired by the Immune System do say that negative selection should be used in intrusion detection to supplement other more specific, and therefore more brittle, protection mechanisms. A nice survey of such multi-component systems is given in the 2004 paper Immune System Approaches to Intrusion Detection - A Review by Uwe Aickelin, Julie Greensmith and Jamie Twycross.

And finally, I'll note that although I decided to explore the "purest" applications of negative selection, it is not the only way that biological immune systems generate their detectors. I have already mentioned clonal selection and expansion: this is probably much more important, with negative selection being employed mainly to prune out self-matching detectors generated during clonal expansion - to remove little tendrils of non-self that have accidentally extruded into self, as it were.

An interesting notion here, incidentally, is immunologist David Nemazee's theory of receptor editing, described in the Scripps Institute's A Selection of Editing. The idea is that when the immune system accidentally generates a self-matching detector, it doesn't waste it; instead, it edits the detector, tweaking it to match non-self. I don't know whether this has been applied to AIS.

Instance-based learning and Learning Vector Quantisation

I have already suggested that the generation of non-self detectors is a form of instance-based learning. In Problems in the field of Artificial Immune Systems, Brownlee makes this point more forcefully. AIS researchers, he says, rarely look into work that is related to but outside the field. For negative selection, he suggests that this related work would be instance-based learning and Learning Vector Quantization. It isn't sensible to ignore closely-related fields in which many years of research have been invested and many results generated. So I thought I'd give a few pointers.

Many machine learning tasks can be formulated as function discovery. Given a training set, i.e. some examples to learn from, think of each example as composed of an input and an output - or if you prefer, an independent variable and a dependent variable. The inputs and outputs may be real numbers, or vectors of numbers, or non-numeric data.

Then learning entails finding a function. If we take any example from the training set and call this function on its input, it should return the corresponding output. Once we know this function, we can call it on a new input to compute - predict - the corresponding output. I am ignoring such difficulties as noise in the data and what happens if our vocabulary isn't right to build the function from, but function discovery is the basic idea.

For example, suppose the training set were the pairs (1,3), (2,5), (-1,-1), with the first element of each pair being the input. Then one possible function is just a straight line:

o = 2*i + 1

i.e. the straight line that fits the points.

Few AIS papers appear to use the notion of function discovery: one that does is Combining negative selection and classification techniques for Anomaly Detection, by Fabio Gonzalez, Dipankar Dasgupta, and Robert Kozma. The authors use negative selection to generate detectors, but then construct a function from these by using a neural net in one experiment and a genetic algorithm in the other.

At any rate, instance-based learning is where you don't construct an explicit representation of the function being learnt, but use a stored subset of the training set to infer it. For more information on instance-based learning, and on that version of it known as Learning Vector Quantisation, see my links.

Negative databases

I'll now turn from immunology to information-hiding. One of the things that inspired me to write a feature about negative representations was discovering the unsuspected topic of negative databases. Their inventor, Fernando Esponda, was inspired by negative representations of self; but that's where we depart from immunology. The immune system has no need for privacy; humans do, and negative databases may be one way to provide it. This is further explained at the Negative Databases site.

The key idea is that we can, in effect, encrypt a database by converting it to its complement: all the records that it doesn't contain. This can be done in a way that is provably hard to undo; and even should a malicious user see one of these complement records, the information it conveys will be so marginal as to be completely unhelpful.

An obvious objection is: but surely the complement will be unmanageably vast? I'll illustrate with a database I once worked with, but before I continue, I must be precise about the definition of complement. This is easy if we regard a database as a set of bitstrings, all of the same length. Call the original, or "positive", database DB. Assume its records are all l bits long. Then the complement of DB is the set of all length-lbitstrings not in DB.

As an example, I once worked with a collection of databases known as the British Family Expenditure Surve. Each year's data typically contained about 300,000 records, each an average of 200 bytes long: that's 300,000 length-1,600 bitstrings. Its complement becomes the remaining 2480,000,000-300,000 bitstrings.

This is vast indeed. It puts one in mind of the following quotation:

High in the North in a land called Svithjod there is a mountain.
It is a hundred miles long and a hundred miles high
and once every thousand years a little bird comes to this mountain to sharpen its beak.
When the mountain has thus been worn away a single day of eternity will have passed.

To reduce this immensity, Esponda notes that many strings in the complement will overlap, having many bits in common. We can look for sets of overlapping strings, and factor out the overlapping regions, writing them once only. Then we insert wildcards to represent the remaining bits, those the strings don't have in common.

It's important to note the difference between the original positive database, its complement, and a compressed representation of the complement. The original and the complement are both sets of bitstrings: formally, they are sets of strings over the alphabet {0,1}. A wildcarded representation of the complement is a set of strings over the alphabet {0,1,*}, * being the wildcard. Esponda denotes the complement by U-DB, the set difference of the set of all length-l strings and DB; he denotes the wildcard-compressed complements by NDB. A database will usually have more than one wildcard-compressed complement.

To generate these compressed complements, Esponda presents the "prefix algorithm". In essence, this is a loop which build successive approximations to its final result, at each iteration adding successively more specific sets of wildcarded strings. For example, from the positive database


the prefix algorithm would add first the set { "1**" }, then the set { "01*" }, and finally the set { "001" }, generating the result


(In this case, the sets all have just one element.) In general, the prefix algorithm produces a negative database whose size is order ln, where n is the number of bitstrings in the original.
I prefer to write this and the other algorithms in functional style:

Let w(i,DB) be the set formed by taking the length-i prefix (the first i characters) of each record in DB. 

Let W(i,DB) be the set of all length-i strings s for which s is not in w(i,DB) and s's length-i-1 prefix is in w(i-1,DB). 

Let P(i,DB) be the set formed by extending every string in W(i,DB) with l-i wildcards.

Then the result of applying the prefix algorithm to DB is: 
P(1,DB) ∪ P(2,DB) ∪ ... ∪ P(l,DB).

What about trying to re-invert the negative database? The prefix algorithm generates a particularly simple class of negative database; but Esponda and colleagues also present a more complicated algorithm, and prove that the time taken to invert its result would rise exponentially with the size of the original database.

The inscrutability of emptiness

The prefix algorithm converts a database to its compressed complement all at once: it assumes we have the entire database ready for it. But can we devise algorithms with which we can incrementally update the complement: "on-line" algorithms for inserting and deleting records?

One interesting point is that if we do this, our starting database is quite likely to be empty, since we may decide all its contents should be added on the fly. Nevertheless, if privacy is to be preserved, even the empty database must become inscrutable when compress-complemented. It must not have some particularly simple form by which a malicious reader could work out that it is empty. I came across a similar notion in cryptography, where if sending encrypted messages, you may not wish a malicious interceptor to see that a message is empty just from its form. (Hearing Tony Blair on the radio while writing this feature, I had a flash of recognition. Now I understand the principle behind his speeches.)

To construct such an inscrutable emptiness, Esponda observes that if each record has length l, the complement of an empty database DB will contain all possible length-l bitstrings.

We can rewrite the complement in terms of wildcards using the following property: a set of 2n distinct strings that are equal in all but n positions match exactly the same set of strings as a single one with those n positions set to the wildcard. For example, the set of four strings {00,01,10,11} will match the same strings that are matched by the single wildcarded string **. We can therefore rewrite the complement as a set of wildcarded strings.

Inversion becomes even harder if we throw in superfluous strings. These will add nothing to the meaning (the complement already matches all strings, so inserting extra ones won't hurt it), but will make the result harder to invert. Indeed, if we choose the position of the wildcards appropriately, we can generate sets of wildcarded strings that are equivalent to certain "SAT" problems. As I explain in the following section, these can be proven to need time exponential in the size of the database to invert. Finally, by using a random-number generator to choose which positions are wildcarded, the create algorithm can create a different compressed representation every time.

How do we update? Deleting a string from the positive database is equivalent to inserting it into the complement. The core of the insert operation is to insert one or more strings that, when added to the complement, cause the inserted string to be matched, but again, are more complicated than necessary.

To delete a string x from the complement NDB, we first delete from it all strings that match x. (That is, we find all the records that match x and delete them - we don't update strings in place.) Since this might unintentionally remove some information we wish to keep, we next have to calculate strings to insert to undo this and we then insert them.

You can try negative databases at the Inside Negative Databases page. This gives a brief explanation of negative databases, and lets you create, view and query them from positive databases that you input. You can convert your input all at once in "batch" mode; or create an initial database and incrementally update it "on-line".

SAT and the complexity of complementation

Once you have created a negative database, the Inside Negative Databases pages offer you the chance to "Run SAT Solver". What is this about?

In essence, it's estimating how hard the database is to invert. It does so by converting it into a well-known "computational hard problem", and then measuring the time taken by an algorithm written specially to solve that problem. The problem is called SAT.

"SAT" is short for "Boolean satisfiability problem": given a Boolean expression that contains Boolean variables, find truth values for the variables that make the expression true. Here's an example from the Wikipedia entry on the Boolean satisfiability problem: in the expression

E = (x1 or ~x2 or ~x3) and (x1 or x2 or x4)

find values for x1, x2, x3 and x4 that make E true. Here, there are several such answers, for example any assignment of values where x1 is true.

In general, the time taken to find such a set of values increases exponentially with the size of the Boolean expression, although there are particular classes of simple expression for which it goes up only polynomially - i.e. as the square, cube, or some other power of the size of the expression.

Although programmers struggling to make hairy optimisation problems run in tractable time may not thank me for saying so, there's a sense in which SAT is nice. This is because SAT problems are relatively easy to formulate and reason about mathematically. So if you want to prove that some other algorithm runs in exponential time, a good method is to prove it equivalent to a SAT problem. That is what Helman, one of the negative database researchers, did to prove them hard to invert.

This explains the "Run SAT Solver" links I started with. Presumably, these convert the negative database you have generated into an equivalent SAT problem and run a SAT solver over the result. The time it takes will give an estimate of the difficulty of inverting the database.

Mainly about negative selection - Negative Representations of Information, by Fernando Esponda. This is Esponda's dissertation: it contains, as well as his research on negative selection, that on negative databases. He also writes about the motivations for the work, and its connection with AIS research. - A Change-Detection Algorithm Inspired by the Immune System, by Stephanie Forrest, Alan Perelson, Lawrence Allen, and Rajesh Cherukuri. Submitted to IEEE Transactions on Software Engineering, 1995. Probably the original paper on this topic. Worth reading as a starting point: it presents a mathematical analysis, with experimental validation, of the effectiveness and cost of using negatively-selected detectors. - An Immunological Approach to Change Detection: Algorithms, Analysis and Implications, by Patrik D'haeseleer, Stephanie Forrest, and Paul Helman, 1996. There are numerous papers which continue the work described in the one above, and I'll only list a selection. This paper introduces more efficient algorithms for generating detectors, in linear time rather than exponential. One, the "greedy algorithm", tries to generate detectors that are maximally far apart, so that each new detector matches as many not-yet-covered non-self strings as possible. This might be of interest in other areas of machine learning. The paper also looks at the relation between "holes" (undetected regions of non-self) and the probability of detection failure, and gives heuristics for tuning the algorithms and for deciding when to use negative selection at all, depending on the level of security needed. - Evaluating Negative Selection in an Artificial Immune System for Network Intrusion Detection, by Jungwon Kim and Peter Bentley, Genetic and Evolutionary Computation Conference 2001. The authors have published several papers on negative selection for detecting intrusions. Here, they apply the Forrest et. al. algorithm to real network traffic data, and conclude that it is completely infeasible to generate a representation of non-self for this data. They do say though, that negative selection may still be useful in other tasks - in immunological terms, for filtering out harmful (self-matching) antibodies rather than generating non-self detectors. - Anomaly detection using negative selection based on the r-contiguous matching rule, by Shantanu Singh, 2002. Extends the greedy algorithm to alphabets of more than two symbols, and examines its use for detecting anomalies in files of assembler instructions and sequences of system calls (both of which an intruder could affect) and in time series. - Combining negative selection and classification techniques for Anomaly Detection, by Fabio Gonzalez, Dipankar Dasgupta, and Robert Kozma, 2002. Generates negative detectors, but then experiments with other machine learning methods - a back-propagation neural network in one experiment, and a genetic algorithm generating fuzzy rules in the other - to construct a function from the detectors, using this to detect anomalies. and Negative Detection.pdf - A Formal Framework for Positive and Negative Detection Schemes, by Fernando Esponda, Stephanie Forrest, and Paul Helman. IEEE Transactions on Systems, Man and Cybernetics, Volume 34, Issue 1, 2004. Reviews work since the 1994 paper, and includes results on how well negative detectors cover the non-self space. The paper also surveys work on negative detection in non-security-related areas. I quote, omitting bibliographic tags: "Since its introduction, interest in negative detection has been growing, especially for applications in which noticing anomalous patterns is important, including computer virus detection, intrusion detection, and industrial milling operations. Recently, other categories of applications have been proposed, including color image classification, and collaborative filtering and evolutionary design. Underlying all this work, however, is the question of negative versus positive detection: When is it advantageous to use a negative-detection scheme instead of the more straightforward positive-detection scheme?". - Is Negative Selection Appropriate for Anomaly Detection?, by Thomas Stibor, Philipp Mohr, and Jonathan Timmis, Genetic and Evolutionary Computation Conference 2005. "Negative selection algorithms for hamming and real-valued shape-spaces are reviewed. Problems are identied with the use of these shape-spaces, and the negative selection algorithm in general, when applied to anomaly detection. A straightforward self detector classication principle is proposed and its classication performance is compared to a real-valued negative selection algorithm and to a one-class support vector machine. Earlier work suggests that realvalue negative selection requires a single class to learn from. The investigations presented in this paper reveal, however, that when applied to anomaly detection, the real-valued negative selection and self detector classication techniques require positive and negative examples to achieve a high classi cation accuracy. Whereas, one-class SVMs only require examples from a single class". - Novelty Detection in Time Series Data using Ideas from Immunology, by Dipankar Dasgupta and Stephanie Forrest, 1995. Early experiments on negative detection for detecting anomalous data indicating faults in a milling process, and for separating noise from a gradually-changing signal. - Online Novelty Detection on Temporal Sequences, by Junshui Ma and Simon Perkins, 2003. The authors refer to the above paper, but say the method can fail: as the "self" set becomes increasingly diverse, the non-self set can become empty. They propose a (non-immunological) algorithm which uses online support vector regression. - Immunotronics - Novel Finite-State-Machine Architectures With Built-In Self-Test Using Self-Nonself Differentiation, by D. Bradley and A. Tyrell, IEEE Transactions on Evolutionary Computing, Volume 6, Number 3, June 2002. The authors create an immunised decade counter that can detect faults in real time. - A Multilayered Immune System for Hardware Fault Tolerance within an Embryonic Array, by R. Canham and A. Tyrrell. The use of negative selection to detect faulty self in an array of logic cells. This is part of the POEtic project on biologically inspired fault-tolerant hardware.

Instance-based learning and Learning Vector Quantisation - Andrew Moore's tutorial slides on Instance-based learning (aka Case-based or Memory-based or non-parametric): "Over a century old, this form of data mining is still being used very intensively by statisticians and machine learners alike. We explore nearest neighbor learning, k-nearest-neighbor, kernel methods and locally weighted polynomial regression". It's well worth looking at his other tutorials too. - Another set of slides on instance-based learning, by Michael Littman and Yihua Wu. Towards the end, these take a brief look at case-based reasoning: instance-based learning with logical, not numeric, data. - Slides on instance-based learning which relate it to Learning Vector Quantisation. - The page for Weka 3: "Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes". And, Weka is open-source. - Jason Brownlee's Learning Vector Quantisation, Self-Organizing Map, and feed-forward neural network algorithms for Weka.

Mainly about negative databases for privacy - Enhancing Privacy through Negative Representations of Data, by Fernando Esponda, Stephanie Forrest and Paul Helman. Introduces negative databases and gives batch-mode algorithms for generating them: the prefix algorithm and another algorithm whose output the authors prove hard to invert. - On-line Negative Databases, by Fernando Esponda, Elena Ackley, Stephanie Forrest, and Paul Helman. Extends the above work to incremental updating of negative databases. - The Negative Databases page, by Stephanie Forrest, Paul Helman, Fernando Esponda, and Elena Ackley. Links to the above papers, Esponda's dissertation, the demo page below, and related work. - Demonstration of negative databases, from the above site.

SAT - Wikipedia page on Boolean satisfiability problem. Explains the meaning of SAT, used when proving negative databases hard to invert. See also, particularly for an explanation of "NP-hard", the section on P vs NP in the comp.theory FAQ at, and Wikipedia on Computational complexity theory at - Changes of Mathematical State by Ivars Peterson, Science News, Volume 157, Number 19, May 6, 2000. "Untangling a web of conflicting demands can be tough on computers": a popular-science explanation of SAT, and of phase changes from unsatisfiable to satisfiable and their connection with spin glasses.

Other - The Human Immune System and Network Intrusion Detection, by Jungwon Kim and Peter Bentley. An early look by the authors at the immune system in general - not just negative selection - and those features we might imitate for intrusion detection. They conclude that since the human immune system is distributed, self-organising and lightweight, it clearly fulfils the design goals for network-based intrusion detection systems. - Immune System Approaches to Intrusion Detection - A Review, by Uwe Aickelin, Julie Greensmith, and Jamie Twycross, 2004. This survey covers emulations of many immunological principles I've not mentioned. Includes a table which classifies the systems surveyed in terms of the principles used. - Improved Pattern Recognition with Artificial Clonal Selection, by Jennifer White and Simon Garrett. How CLONALG works and how it performed in pattern recognition. - Peter Worth's applets for demonstrating negative selection, positive selection, and clonal selection. The Java source code can be downloaded. - A Selection of Editing, by Jason Socrates Bardi, Scripps Research Institute, 2004. About immunologist David Nemazee's theory of "receptor editing": that self-matching immune cells have their receptors tweaked to match non-self. - Jason Brownlee's Pensive Pondering blog. An excellent resource for, and explanation of, much AIS work. - Brownlee's Negative Selection - Modelling in the complement space. - Brownlee's Problems in the field of Artificial Immune Systems. - The New Immunology, from The Evolving Mind, Ben Goertzel, 1993. A short account of how memory for antigens can be embodied in the dynamics of an immune network system. - Brownlee's Idiotypic Network Theory (the immune network model). Useful links should you want to follow up the above. - Brownlee's The Biological Immune System - Useful Resources. Start here for links to information about immunology and the workings of natural immune systems. - For those who like to know the history of things, this essay by Matt Inlay is an interesting look at how the immune system evolved. It's actually Inlay's response to a chapter in the book Darwin's Black Box by Michael Behe, a believer in "intelligent design". See also EvoWiki's entry for "Immune system", at -'s page on the Nobel Prize in Physiology or Medicine for 1960, with links to the banquet speeches and Nobel lectures by Macfarlane Burnet and Peter Medawar for their work on acquired immunological tolerance. - An Interpretative Introduction to the Immune System, Steven Hofmeyr, 2000. An introduction to immunology for computer scientists interested in AIS. Hofmeyr warns that it is not a comprehensive overview, and certainly does not stand in for a good immunology textbook. - Immunology for Physicists, Alan Perelson, 1995. I've not had time to read this, but as it's by one of the big names in AIS theory, it should be well worth reading. Note the date, however. - MIT's OpenCourseWare course for Health Sciences and Technology on Cellular and Molecular Immunology, 2002. Includes detailed lecture notes, exercises and a bibliography. - Immunology PartIB Home Page, from the Cambridge University Pathology Department's Immunology Division. Nicely illustrated notes on immunology by Nick Holmes, current up to 1998-2001. - Images from the film Fantastic Voyage, at the Foresight Nanotechnology Institute's Nanomedicine Art Gallery. A Google image search for fantastic voyage raquel welch will turn up more pictures.

"Once every thousand years a little bird comes to this rock to sharpen its beak". Postings on the origin of this tale.


  1. I can't resist mentioning this one. Some immune systems are programmed to go wrong. During the mating season of the Australian marsupial mouse Antechinus, levels of cortisol rise in the blood of the male. These high levels persist after mating is over, and eventually suppress the male's immune system, causing it to die from one or other kind of infection. This, of course, leaves more resources for females and young. I owe this information to Why Geese Don't Get Obese (and We Do): How Evolution's Strategies for Survival Affect Our Everday Lives by Eric Widmaier.

  2. My favourite negative image is the notorious put-down "This paper fills a much-needed gap in the literature". Apparently, this never actually occurred in a review. Its origins are explained by Allyn Jackson in Chinese Acrobatics, an Old-Time Brewery, and the "Much Needed Gap": The Life of Mathematical Reviews. You'll find this at, Notices of the AMS, Volume 44, Number 3, March 1997.

Two Blogs

I've been collecting links to blogs: some day when I have time, I'll publish a list. Jason Brownlee's Pensive Pondering is one, obviously; here are two more.

    "This is an experiment in the application of a blog to academic research in machine learning and learning theory by John Langford. Exactly where this experiment takes us and how the blog will turn out to be useful (or not) is one of those prediction problems we so dearly love in machine learning. There are a number of other people who have contributed posts including Drew Bagnell, Sanjoy Dasgupta, Hal Daume, and Alex Gray". This blog is really a collection of mini-tutorials on machine learning, and very instructive.

"Welcome to the blog of the F-Secure Security Labs, maintained by the personnel in charge of analysing virus, phishing, spyware and spam attacks". This isn't AI, but it's a good read - and not unrelated to the main feature.

A Moral for Learning Machines

Life magazine once ran a two-page spread of about 40 photographs of different persons. Half of them were of college professors, scientists, and esteemed businessmen. The other half were criminals ranging from thieves to rapists to murderers. The magazine feature was a fun contest for the reader to see if he could tell the good citizens from the criminals. My wife and I tried it. My score was about 30 percent right; her score was 100 percent right. Did she have special insight? Yes, but not about faces. She observed that half the photographs had the same draped background and deduced correctly that the criminals were photographed at the same locale.

This story comes from How to Draw Caricatures by Lenn Redman. It reminds me of the urban legend about the neural net which was trained to spot camouflaged tanks. Trained on 100 battlefield scenes, each of which contained either a tree with a tank hiding behind it, or a tree with no tank behind it, the net did indeed learn to sort scenes with tanks from scenes with no tanks. Not as a result of the tanks; but because the images with tanks had been taken on a cloudy day while images without tanks had been taken on a sunny day. This version of the story appears on Neil Fraser's Neural Network Follies, at

Is the Primordial Soup Done Yet?

This is the title of an interesting essay at by physicist Cosma Rohilla Shalizi on how to quantify self-organisation. He says that it is now rather out of date, but links to a more recent discussion. And his thoughts on how to approach the problem are well-worth reading, such as this principle:

We need to be clever. Neat hacks can work wonders. Newton's law of gravity would be worthless if you couldn't pretend extended objects are concentrated at their centers of mass. Entropy would be useless in thermodynamics if it weren't for the connection to heat and temperature, and maybe similar relations hold in other cases.


From's neologisms compilation, and submitted by C. Johnston, a definition for aibohphobia: 1) The fear of palindromes. 2) The fear of mechanical dogs. With the latter sense, the word is commonly spelt aibophobia.