AI Expert Newsletter
AI - The art and science of making computers do interesting
things that are not in their nature.
March 2006
First, apologies for the late appearance of this issue. I had
problems with my laptop's Internet connection, which meant that
I ended up having to remotely edit a backup copy of the Newsletter
held on my Web server machine.
Anyway, over the past month, I've been looking into holes. Actually,
I started by learning some immunology, because I'd intended to write
about Artificial Immune Systems.
Our immune systems learn. They sometimes learn extremely effectively,
as when they protect us from reinfection by childhood diseases such
as measles and mumps. Because of this, some researchers have tried
emulating the immune system to find out whether immunological principles
can help us solve computational problems. The most obvious application
is detecting computer-system intruders such as spam, viruses, and
malicious network connections. But this is not the only possibility;
indeed, one theory of immunology, so-called immune network theory,
ought to interest those working on dynamical models of cognition.
While reading about immunology and Artificial Immune Systems,
I came across the topics of negative selection and negative databases.
These looked intriguing, and I decided to investigate them in more
detail. So that is what I have ended up writing about.
There is a lot of other work on Artificial Immune Systems, some
of which you'll be able to find from my links. I hope, in a future
issue, to write a more general feature about the subject. Let me
go on then to negative representations of information.
I'm going to talk about two very different applications of negative
representations of data. The first is the detection of intruders
and anomalous behaviour generally. The second is about ensuring
privacy of databases. This unusual idea entails representing a database
by storing all the records it doesn't contain: a "negative database".
Suitably constructed, the result is provably hard to re-invert.
You can, therefore, give away a negative database on which a suitable
query program will be able to perform certain kinds of search, yet
ensure that the database as a whole cannot be decoded.
Despite these different objectives, both applications share intellectual
history, originating in the study of Artificial Immune Systems.
Negative databases were invented by Fernando Esponda only a year
or three ago. In his dissertation, Negative
Representations of Information, he states that the source
of his inspiration was negative selection in the immune system.
So let's look at that next.
Negative selection is closely connected with the "self versus
non-self" theory of immunology. Introduced by Macfarlane Burnet
in the 1950s, this is the theory that the immune system recognises
pathogens - disease-causing organisms such as viruses and bacteria
- by detecting molecules present in them ("non-self") but not in
our own selves. How does it do this? Here's Burnet's answer, from
his 1960 Nobel
Lecture:
How can an immunized animal recognize the difference between
an injected material like insulin or serum albumin from another
species and its own corresponding substance?
Clearly this is a problem of information. It is conceivable
that a substance could be recognized as foreign if it was built
up of chemical configurations insusceptible to enzymic breakdown
by the available mechanisms of the animal involved. This may have
some relevance to micro-organismal antigens but not to the substances
of vertebrate origin that are our present concern. Their recognition
in the sense in which we are using the word, requires that there
be available in the body a large volume of accessible "information"
with some superficial analogies to a dictionary. In other words,
there must be something against which a configuration can be compared
and a decision made whether it corresponds or not. We find somewhere
a combination of letters RAXE and we use an English dictionary
to find that there is no such word in English. If the body is
to differentiate between self and not-self configurations, the
only general form of a solution that has so far been thought of
requires the presence of a complete set of complementary steric
patterns in some accessible form which correspond to
either (a) all configurations not present in body components,
or (b) all configurations present in body components,
or (c) all configurations but in two categories corresponding
to (a) and (b) above.
Of these alternatives the first is obviously the most attractive
as providing a positive recognition of any configuration against
which reaction will be necessary. It is the only one which will
be elaborated here - none of the others have been seriously considered
by anyone. I should agree with Jerne that the information needed
may be compared to a "purged xenotypic dictionary".
I don't believe Burnet's resounding phrase is used any more, but
it's striking, and it expresses well what, at an abstract level,
is going on. The idea is that the immune system deploys a vast number
of detector cells which it has generated at random. Arrayed around
their surface, these cells have detector molecules: "receptors".
All receptors on any one cell have the same shape; but different
cells or groups of cell have receptors of different shape. The randomly-generated
repertoire of detector cells therefore contains an equally varied
repertoire of receptor molecules.
Shape is important, because it's how the immune system implements
pattern-matching. Immunologists talk about "antigens": substances
capable of stimulating the immune system to produce antibodies.
If a receptor has a shape complementary to part of an antigen, receptor
and antigen will bind, setting off a cascade of cellular events
which will, we hope, eventually eliminate whatever type of organism
was carrying the antigen. The analogy often used is of a key fitting
into a lock, or a hand into a glove.
Given that the immune system generates detectors at random, and
that therefore some might by chance match molecules belonging to
our own "self", how does it prevent these self-matchers from causing
trouble? The answer is that it eliminates them. The detectors go
through a training process in which they are exposed to a selection
of "self" molecules; those detectors that match them are promptly
destroyed. The immune system therefore ends up with a repertoire
of detectors that match randomly generated instances of non-self.
It's this we want to emulate on the computer.
What might we use computerised negative selection for? The most
compelling analogy is with intruders that, like pathogens, we wish
to destroy; it's therefore not surprising that there has been a
fair amount of Artificial Immune Systems research into computer
security. But, as I mentioned earlier, we might want to detect other
types of anomalous behaviour, and some of my links are about doing
just that.
In some cases, we shall want to keep up with a "self" that changes
over time. This is something that probably doesn't happen much in
our bodies, but can happen in data processing when chasing slowly
varying signals, or monitoring computer networks to which new hardware
is frequently added.
This connects up with the idea that we may have no explicit definition
of non-self, so that the only way to learn one is by accumulating
sample instances of it. In other words, negative selection is really
about the instance-based learning of non-self. For an animation
which well illustrates this, try Peter Worth's selection
applet. You can also download its Java source code.
Jason Brownlee, in the Negative
Selection - Modelling in the complement space page of his
extremely useful blog on Artificial Immune Systems, discusses this,
and notes that these ideas are typically applied to domains with
the following characteristics:
They are online, and the system must specialise over time to
the patterns in the data stream.
The volume of good patterns out weighs the volume of bad patterns
in terms of observed data.
The observed distinct patterns that are valid outnumber the
number of distinct patterns that are invalid.
The size of the distnct valid pattern space is small in comparison
to the size of its complement.
Much Artificial Immune Systems research combines negative selection
with "clonal selection", "clonal expansion", "somatic hypermutation"
and "affinity maturation". In biological immune systems, when a
non-self detector matches an antigen, it is stimulated to reproduce;
as it does so, the immune system mutates the receptors on the resulting
offspring. These new, mutated, detectors compete to bind to antigens;
those that bind most strongly survive and reproduce further. The
end result is a set of detectors that match the antigen very closely,
generated by an extremely rapid form of evolutionary reinforcement
learning. The start of this process, selection of the original detector
for reproduction, is clonal selection. Clonal expansion refers to
the entire process of generating these variants; affinity maturation
to the reinforcement learning; and somatic hypermutation to mutating
the receptors.
It's not surprising that AIS researchers have implemented learning
algorithms based on this process. One popular algorithm is is CLONALG:
in Improved Pattern
Recognition with Artificial Clonal Selection?, Jennifer
White and Simon Garrett explain its workings and explore how it
performs as a pattern recogniser. You can try it yourself by running
Peter Worth's Clonal
Selection applet.
Before giving an example of computerised negative selection, I
should say a bit about our understanding of immunology. In the first
place, it's interesting that in his lecture,
Burnet suggests negative selection may have originated with the
need to detect and destroy, not pathogens, but malfunctioning self
cells such as tumour cells. I presume that immunology has taken
the idea further since then; and there are certainly analogues in
AIS research. These include Bradley and Tyrell's Immunotronics
- Novel Finite-State-Machine Architectures With Built-In Self-Test
Using Self-Nonself Differentiation, which demonstrates this
approach by creating an immunised decade counter that can detect
faults in real time. Related to this is Canham and Tyrell's A
Multilayered Immune System for Hardware Fault Tolerance within an
Embryonic Array. This is part of the POEtic
project on fault-tolerant biologically-inspired hardware.
In the second place, I should add that by "immune system", I mean
the mammalian immune system. Other classes of animal and plant also
have immune systems, but these don't work in exactly the same way
as the mammalian, and as far as I know, are not so versatile.
Furthermore, immunologists distinguish between the "adaptive"
immune system, which learns responses against pathogens, and the
"innate", whose responses we are born with. An example of the latter
is the macrophage, a type of white blood cell. To some, macrophages
are best-known for trying to engulf Raquel Welch. Admittedly, she
was in a submarine at the time: a micro-miniaturised blood-vessel-cruising
sub in the film Fantastic Voyage. At this point, there may
be some readers contemplating reincarnation as a macrophage. However,
macrophages normally engulf and destroy certain types of bacteria.
They have evolved this behaviour over eons as a response to certain
molecules carried by the bacterium, and it can't be relearnt. The
innate and adaptive immune systems are very closely linked, and
though most AIS work has concentrated on the adaptive, some of the
more recent research emulates features of the innate too, as extra
protection.
My third point is that I have described only one aspect of the
immune system, and I have done so at a very abstract level to show
quickly its connection with computing. My links point at some Web-based
resources; but if you want to experiment with AIS, there's no substitute
for a good immunology textbook; recent, with clear diagrams. Copious
supplies of scrap paper are also recommended.
Allied to this is the fact that we understand much more about
the immune system and its cell-level mechanisms than when Burnet
received his Nobel Prize. With this have come a number of competing
theories and subtheories. For example, self versus non-self is often
held not to be the complete picture. The body doesn't try to eliminate
the bacterial flora living in our guts, for example, even though
these are non-self.
Of these theories, I find "immune
network theory" particularly inspiring, because it shows how
memories (about which antibodies to deploy against a particular
antigen) can be stored in the dynamics of a network of interactions
between cells. See for example The New Immunology,
Chapter 2 from Ben Goertzel's 1993 book The Evolving Mind.
Such work ought to interest those working on dynamical models of
cognition.
This variety of theories is not surprising. The immune system
is intricately and bizarrely complicated, involving a plethora of
chemical signalling pathways, innate and learnt responses, and threat-detection
mechanisms. There is even a kind of dual-key "co-stimulation" system
for threat confirmation. Some idea of this complexity can be gained
by counting the pages of, and the size of the colour-coded cell-type
keys in, any good immunology textbook. As the latter will demonstrate,
the number of career options open to an immature immune-system cell
would put those available to an Oxford graduate to shame.
It must be admitted that many of the immune system's cellular
career options terminate in death. This brings me to an example
of negative selection in AIS. I'll take this from A
Change-Detection Algorithm Inspired by the Immune System,
by Stephanie Forrest, Alan Perelson, Lawrence Allen, and Rajesh
Cherukuri. Including a simple mathematical analysis of the effectiveness
of negative detection, plus experiments to validate this, it's a
good starting point.
Here's the example. Let's suppose that we have a computer prone
to virus infection, and that there's a crucial file on it that must
be protected. We first split the file into a collection of equal-sized
substrings. Call this S, for "self".
Thus, if our file were the text of "Mary had a little lamb", and
we split it into substrings of length 5, S would become:
{ "had a", "Mary ", "erywh", "o go.", " litt", "le la", etc. }
The authors did not remove duplicate substrings, making S a
bag or multiset rather than a set. They don't record the order of
the substrings, but say that for sufficiently long substrings, this
loss of order information is not a problem.
We next generate another collection of strings, our detectors
or receptors. Call this R. It's created by generating random
strings from the same character set (the same alphabet, formally
speaking) as that of the file to be protected. But we match each
string against S, and delete it if it matches. That's the
negative selection.
For strings of any interesting size, it would be very rare for
two to be exactly equal. So the authors use a weak matching rule
which they call "r-contiguous symbols". Two strings match
if they agree in r contiguous character positions. Thus the
two strings below match (at dcb) when r ≤
3:
abadcbab
cagdcbba
Later research has experimented with other rules.
With R generated, we are ready. We continually monitor
the file under surveillance by splitting it into substrings as we
did when constructing S. Match each of these substrings against
all strings in R. We know R won't match the original
file, because when constructing R, we threw away all strings
that did. So if any match is detected, something or someone has
changed the file.
It is, the authors say with understatement, not immediately obvious
that this is a good idea. Surely, for R to be a comprehensive
representation of non-self, it would have to be unimaginably vaster
than S? Indeed, this is where I started becoming sceptical.
To show the method is practicable, the authors derive formulae
for how the probability of detecting a non-self string depends on
the number of detectors and on other parameters such as r.
Detection probability increases exponentially with the number of
detectors, because of the law of multiplication of probabilities.
But the probability that one detector will spot a non-self string
is surprisingly high, thus keeping the number of detectors low.
For example, in an experiment where the authors use negative selection
to monitor a file that they infect with the boot-sector virus "Kilroy",
as few as five detectors detected the virus every time.
A big advantage of detecting non-self rather than self, the authors
say, is that sets of non-self detectors can be split between several
detector sites. Stating it more generally, small sections of an
object can be checked for change independently. You couldn't usefully
split a set of self-detectors in this way.
For example, if we wanted to detect malicious TCP/IP connections
in a local area network, each computer on the network could be equipped
with a different set of non-self detectors. These would have been
created as above: generate TCP/IP strings at random, and screen
out those that are permitted.
This distributability property is crucial, say the authors, because
it allows each copy of the algorithm to use a unique set of detectors.
Having identical protection algorithms can be a major vulnerability
in large networks of computers because an intrusion at one site
implies that all sites are vulnerable.
However, there is a problem, which Brownlee explains in Negative
Selection - Modelling in the complement space. He remarks
that:
Approximation may be efficient, but when false negatives
can mean that the network or host has been infiltrated, I question
the use of modelling the complement alone. In fact, the efficacy
of the system is only going to be as good as the random or neighbourhood
guessing for new complement patterns, or as mature as the library
of confirmed - observed complement patterns. It seems negligent
not to use some kind of confidence measure that the pattern is not
normal by using some additional model of what is normal.
I agree, and am therefore sceptical about negative selection used
on its own for intrusion detection. But the authors of An
Immunological Approach to Change Detection: Algorithms, Analysis
and Implications, a follow-up to A Change-Detection Algorithm
Inspired by the Immune System do say that negative selection
should be used in intrusion detection to supplement other more specific,
and therefore more brittle, protection mechanisms. A nice survey
of such multi-component systems is given in the 2004 paper
Immune System Approaches to Intrusion Detection - A Review
by Uwe Aickelin, Julie Greensmith and Jamie Twycross.
And finally, I'll note that although I decided to explore the
"purest" applications of negative selection, it is not the only
way that biological immune systems generate their detectors. I have
already mentioned clonal selection and expansion: this is probably
much more important, with negative selection being employed mainly
to prune out self-matching detectors generated during clonal expansion
- to remove little tendrils of non-self that have accidentally extruded
into self, as it were.
An interesting notion here, incidentally, is immunologist David
Nemazee's theory of receptor editing, described in the Scripps Institute's
A
Selection of Editing. The idea is that when the immune system
accidentally generates a self-matching detector, it doesn't waste
it; instead, it edits the detector, tweaking it to match non-self.
I don't know whether this has been applied to AIS.
I have already suggested that the generation of non-self detectors
is a form of instance-based learning. In Problems
in the field of Artificial Immune Systems, Brownlee makes
this point more forcefully. AIS researchers, he says, rarely look
into work that is related to but outside the field. For negative
selection, he suggests that this related work would be instance-based
learning and Learning Vector Quantization. It isn't sensible to
ignore closely-related fields in which many years of research have
been invested and many results generated. So I thought I'd give
a few pointers.
Many machine learning tasks can be formulated as function discovery.
Given a training set, i.e. some examples to learn from, think of
each example as composed of an input and an output - or if you prefer,
an independent variable and a dependent variable. The inputs and
outputs may be real numbers, or vectors of numbers, or non-numeric
data.
Then learning entails finding a function. If we take any example
from the training set and call this function on its input, it should
return the corresponding output. Once we know this function, we
can call it on a new input to compute - predict - the corresponding
output. I am ignoring such difficulties as noise in the data and
what happens if our vocabulary isn't right to build the function
from, but function discovery is the basic idea.
For example, suppose the training set were the pairs (1,3), (2,5),
(-1,-1), with the first element of each pair being the input. Then
one possible function is just a straight line:
o = 2*i + 1
i.e. the straight line that fits the points.
Few AIS papers appear to use the notion of function discovery:
one that does is Combining negative
selection and classification techniques for Anomaly Detection,
by Fabio Gonzalez, Dipankar Dasgupta, and Robert Kozma. The authors
use negative selection to generate detectors, but then construct
a function from these by using a neural net in one experiment and
a genetic algorithm in the other.
At any rate, instance-based learning is where you don't construct
an explicit representation of the function being learnt, but use
a stored subset of the training set to infer it. For more information
on instance-based learning, and on that version of it known as Learning
Vector Quantisation, see my links.
I'll now turn from immunology to information-hiding. One of the
things that inspired me to write a feature about negative representations
was discovering the unsuspected topic of negative databases. Their
inventor, Fernando Esponda, was inspired by negative representations
of self; but that's where we depart from immunology. The immune
system has no need for privacy; humans do, and negative databases
may be one way to provide it. This is further explained at the Negative
Databases site.
The key idea is that we can, in effect, encrypt a database by
converting it to its complement: all the records that it doesn't
contain. This can be done in a way that is provably hard to undo;
and even should a malicious user see one of these complement records,
the information it conveys will be so marginal as to be completely
unhelpful.
An obvious objection is: but surely the complement will be unmanageably
vast? I'll illustrate with a database I once worked with, but before
I continue, I must be precise about the definition of complement.
This is easy if we regard a database as a set of bitstrings, all
of the same length. Call the original, or "positive", database DB.
Assume its records are all l bits long. Then the complement
of DB is the set of all length-l bitstrings not in
DB.
As an example, I once worked with a collection of databases known
as the British Family Expenditure Surve. Each year's data typically
contained about 300,000 records, each an average of 200 bytes long:
that's 300,000 length-1,600 bitstrings. Its complement becomes the
remaining 2480,000,000-300,000 bitstrings.
This is vast indeed. It puts one in mind of the following quotation:
High in the North in a land called Svithjod there is
a mountain.
It is a hundred miles long and a hundred miles high
and once every thousand years a little bird comes to this mountain
to sharpen its beak.
When the mountain has thus been worn away a single day of eternity
will have passed.
To reduce this immensity, Esponda notes that many strings in the
complement will overlap, having many bits in common. We can look
for sets of overlapping strings, and factor out the overlapping
regions, writing them once only. Then we insert wildcards to represent
the remaining bits, those the strings don't have in common.
It's important to note the difference between the original positive
database, its complement, and a compressed representation of the
complement. The original and the complement are both sets of bitstrings:
formally, they are sets of strings over the alphabet {0,1}. A wildcarded
representation of the complement is a set of strings over the alphabet
{0,1,*}, * being the wildcard. Esponda denotes the complement by
U-DB, the set difference of the set of all length-l
strings and DB; he denotes the wildcard-compressed complements
by NDB. A database will usually have more than one wildcard-compressed
complement.
To generate these compressed complements, Esponda presents the
"prefix algorithm". In essence, this is a loop which build successive
approximations to its final result, at each iteration adding successively
more specific sets of wildcarded strings. For example, from the
positive database
000
the prefix algorithm would add first the set { "1**" }, then the set
{ "01*" }, and finally the set { "001" }, generating the result
1**
01*
001
(In this case, the sets all have just one element.) In general, the
prefix algorithm produces a negative database whose size is order
ln, where n is the number of bitstrings in the
original.
I prefer to write this and the other algorithms in functional
style:
Let w(i,DB) be the set formed by
taking the length-i prefix (the first i characters)
of each record in DB.
Let W(i,DB) be the set of all length-i
strings s for which s is not in w(i,DB)
and s's length-i-1 prefix is in w(i-1,DB).
Let P(i,DB) be the set formed by extending
every string in W(i,DB) with l-i
wildcards.
Then the result of applying the prefix algorithm to DB is:
P(1,DB) ∪ P(2,DB) ∪ ... ∪
P(l,DB).
What about trying to re-invert the negative database? The prefix
algorithm generates a particularly simple class of negative database;
but Esponda and colleagues also present a more complicated algorithm,
and prove that the time taken to invert its result would rise exponentially
with the size of the original database.
The prefix algorithm converts a database to its compressed complement
all at once: it assumes we have the entire database ready for it.
But can we devise algorithms with which we can incrementally update
the complement: "on-line" algorithms for inserting and deleting
records?
One interesting point is that if we do this, our starting database
is quite likely to be empty, since we may decide all its contents
should be added on the fly. Nevertheless, if privacy is to be preserved,
even the empty database must become inscrutable when compress-complemented.
It must not have some particularly simple form by which a malicious
reader could work out that it is empty. I came across a similar
notion in cryptography, where if sending encrypted messages, you
may not wish a malicious interceptor to see that a message is empty
just from its form. (Hearing Tony Blair on the radio while writing
this feature, I had a flash of recognition. Now I understand the
principle behind his speeches.)
To construct such an inscrutable emptiness, Esponda observes that
if each record has length l, the complement of an empty database
DB will contain all possible length-l bitstrings.
We can rewrite the complement in terms of wildcards using the
following property: a set of 2n distinct strings
that are equal in all but n positions match exactly the same
set of strings as a single one with those n positions set
to the wildcard. For example, the set of four strings {00,01,10,11}
will match the same strings that are matched by the single wildcarded
string **. We can therefore rewrite the complement as a set of wildcarded
strings.
Inversion becomes even harder if we throw in superfluous strings.
These will add nothing to the meaning (the complement already matches
all strings, so inserting extra ones won't hurt it), but will make
the result harder to invert. Indeed, if we choose the position of
the wildcards appropriately, we can generate sets of wildcarded
strings that are equivalent to certain "SAT" problems. As I explain
in the following section, these can be proven to need time exponential
in the size of the database to invert. Finally, by using a random-number
generator to choose which positions are wildcarded, the create algorithm
can create a different compressed representation every time.
How do we update? Deleting a string from the positive database
is equivalent to inserting it into the complement. The core of the
insert operation is to insert one or more strings that, when added
to the complement, cause the inserted string to be matched, but
again, are more complicated than necessary.
To delete a string x from the complement NDB, we
first delete from it all strings that match x. (That is,
we find all the records that match x and delete them - we
don't update strings in place.) Since this might unintentionally
remove some information we wish to keep, we next have to calculate
strings to insert to undo this and we then insert them.
You can try negative databases at the Inside
Negative Databases page. This gives a brief explanation
of negative databases, and lets you create, view and query them
from positive databases that you input. You can convert your input
all at once in "batch" mode; or create an initial database and incrementally
update it "on-line".
Once you have created a negative database, the Inside Negative
Databases pages offer you the chance to "Run SAT Solver". What
is this about?
In essence, it's estimating how hard the database is to invert.
It does so by converting it into a well-known "computational hard
problem", and then measuring the time taken by an algorithm written
specially to solve that problem. The problem is called SAT.
"SAT" is short for "Boolean satisfiability problem": given a Boolean
expression that contains Boolean variables, find truth values for
the variables that make the expression true. Here's an example from
the Wikipedia entry
on the Boolean satisfiability problem: in the expression
E = (x1 or ~x2 or
~x3) and (x1 or x2
or x4)
find values for x1, x2, x3
and x4 that make E true. Here, there are several
such answers, for example any assignment of values where x1
is true.
In general, the time taken to find such a set of values increases
exponentially with the size of the Boolean expression, although
there are particular classes of simple expression for which it goes
up only polynomially - i.e. as the square, cube, or some other power
of the size of the expression.
Although programmers struggling to make hairy optimisation problems
run in tractable time may not thank me for saying so, there's a
sense in which SAT is nice. This is because SAT problems are relatively
easy to formulate and reason about mathematically. So if you want
to prove that some other algorithm runs in exponential time, a good
method is to prove it equivalent to a SAT problem. That is what
Helman, one of the negative database researchers, did to prove them
hard to invert.
This explains the "Run SAT Solver" links I started with. Presumably,
these convert the negative database you have generated into an equivalent
SAT problem and run a SAT solver over the result. The time it takes
will give an estimate of the difficulty of inverting the database.
Mainly about negative selection
www.cs.unm.edu/~forrest/dissertations-and-proposals/fernando-dissertation.pdf
- Negative Representations of Information, by Fernando Esponda.
This is Esponda's dissertation: it contains, as well as his research
on negative selection, that on negative databases. He also writes
about the motivations for the work, and its connection with AIS
research.
citeseer.ist.psu.edu/51699.html
- A Change-Detection Algorithm Inspired by the Immune System,
by Stephanie Forrest, Alan Perelson, Lawrence Allen, and Rajesh
Cherukuri. Submitted to IEEE Transactions on Software Engineering,
1995. Probably the original paper on this topic. Worth reading as
a starting point: it presents a mathematical analysis, with experimental
validation, of the effectiveness and cost of using negatively-selected
detectors.
www.cs.unm.edu/~forrest/publications/ieee-sp-96-neg-selec.pdf
- An Immunological Approach to Change Detection: Algorithms,
Analysis and Implications, by Patrik D'haeseleer, Stephanie
Forrest, and Paul Helman, 1996. There are numerous papers which
continue the work described in the one above, and I'll only list
a selection. This paper introduces more efficient algorithms for
generating detectors, in linear time rather than exponential. One,
the "greedy algorithm", tries to generate detectors that are maximally
far apart, so that each new detector matches as many not-yet-covered
non-self strings as possible. This might be of interest in other
areas of machine learning. The paper also looks at the relation
between "holes" (undetected regions of non-self) and the probability
of detection failure, and gives heuristics for tuning the algorithms
and for deciding when to use negative selection at all, depending
on the level of security needed.
www.cs.ucl.ac.uk/staff/J.Kim/pub/RW254.ps
- Evaluating Negative Selection in an Artificial Immune System
for Network Intrusion Detection, by Jungwon Kim and Peter Bentley,
Genetic and Evolutionary Computation Conference 2001. The authors
have published several papers on negative selection for detecting
intrusions. Here, they apply the Forrest et. al. algorithm to real
network traffic data, and conclude that it is completely infeasible
to generate a representation of non-self for this data. They do
say though, that negative selection may still be useful in other
tasks - in immunological terms, for filtering out harmful (self-matching)
antibodies rather than generating non-self detectors.
www.aber.ac.uk/~icawww/Proceedings/paper-24/singh02_anomdet1.pdf
- Anomaly detection using negative selection based on the r-contiguous
matching rule, by Shantanu Singh, 2002. Extends the greedy algorithm
to alphabets of more than two symbols, and examines its use for
detecting anomalies in files of assembler instructions and sequences
of system calls (both of which an intruder could affect) and in
time series.
issrl.cs.memphis.edu/papers/ais/2002/CEC-02-3.pdf
- Combining negative selection and classification techniques
for Anomaly Detection, by Fabio Gonzalez, Dipankar Dasgupta,
and Robert Kozma, 2002. Generates negative detectors, but then experiments
with other machine learning methods - a back-propagation neural
network in one experiment, and a genetic algorithm generating fuzzy
rules in the other - to construct a function from the detectors,
using this to detect anomalies.
www.cs.unm.edu/~immsec/publications/Positive%20and%20Negative%20Detection.pdf
- A Formal Framework for Positive and Negative Detection Schemes,
by Fernando Esponda, Stephanie Forrest, and Paul Helman. IEEE
Transactions on Systems, Man and Cybernetics, Volume 34, Issue
1, 2004. Reviews work since the 1994 paper, and includes results
on how well negative detectors cover the non-self space. The paper
also surveys work on negative detection in non-security-related
areas. I quote, omitting bibliographic tags: "Since its introduction,
interest in negative detection has been growing, especially for
applications in which noticing anomalous patterns is important,
including computer virus detection, intrusion detection, and industrial
milling operations. Recently, other categories of applications have
been proposed, including color image classification, and collaborative
filtering and evolutionary design. Underlying all this work, however,
is the question of negative versus positive detection: When is it
advantageous to use a negative-detection scheme instead of the more
straightforward positive-detection scheme?".
www.sec.informatik.tu-darmstadt.de/de/mitarbeiter/stibor/papers/gecco2005_c.pdf
- Is Negative Selection Appropriate for Anomaly Detection?,
by Thomas Stibor, Philipp Mohr, and Jonathan Timmis, Genetic and
Evolutionary Computation Conference 2005. "Negative selection algorithms
for hamming and real-valued shape-spaces are reviewed. Problems
are identied with the use of these shape-spaces, and the negative
selection algorithm in general, when applied to anomaly detection.
A straightforward self detector classication principle is proposed
and its classication performance is compared to a real-valued negative
selection algorithm and to a one-class support vector machine. Earlier
work suggests that realvalue negative selection requires a single
class to learn from. The investigations presented in this paper
reveal, however, that when applied to anomaly detection, the real-valued
negative selection and self detector classication techniques require
positive and negative examples to achieve a high classi cation accuracy.
Whereas, one-class SVMs only require examples from a single class".
citeseer.ist.psu.edu/dasgupta95novelty.html
- Novelty Detection in Time Series Data using Ideas from Immunology,
by Dipankar Dasgupta and Stephanie Forrest, 1995. Early experiments
on negative detection for detecting anomalous data indicating faults
in a milling process, and for separating noise from a gradually-changing
signal.
nis-www.lanl.gov/~simes/webdocs/ma.kdd03.pdf
- Online Novelty Detection on Temporal Sequences, by Junshui
Ma and Simon Perkins, 2003. The authors refer to the above paper,
but say the method can fail: as the "self" set becomes increasingly
diverse, the non-self set can become empty. They propose a (non-immunological)
algorithm which uses online support vector regression.
eprints.whiterose.ac.uk/archive/00000970/01/tyrrellam1.pdf
- Immunotronics - Novel Finite-State-Machine Architectures With
Built-In Self-Test Using Self-Nonself Differentiation, by D.
Bradley and A. Tyrell, IEEE Transactions on Evolutionary Computing,
Volume 6, Number 3, June 2002. The authors create an immunised decade
counter that can detect faults in real time.
www.aber.ac.uk/~icawww/Proceedings/paper-06/CanhamTyrrellFinal1.pdf
- A Multilayered Immune System for Hardware Fault Tolerance within
an Embryonic Array, by R. Canham and A. Tyrrell. The use of
negative selection to detect faulty self in an array of logic cells.
This is part of the POEtic project on biologically inspired fault-tolerant
hardware: see www.poetictissue.org/.
Instance-based learning and Learning Vector Quantisation
www.autonlab.org/tutorials/mbl.html
- Andrew Moore's tutorial slides on Instance-based learning (aka
Case-based or Memory-based or non-parametric): "Over a century
old, this form of data mining is still being used very intensively
by statisticians and machine learners alike. We explore nearest
neighbor learning, k-nearest-neighbor, kernel methods and locally
weighted polynomial regression". It's well worth looking at his
other tutorials
too.
www.cs.rutgers.edu/~mlittman/courses/ml03/ch8.pdf
- Another set of slides on instance-based learning, by Michael Littman
and Yihua Wu. Towards the end, these take a brief look at case-based
reasoning: instance-based learning with logical, not numeric, data.
dspc11.cs.ccu.edu.tw/ml93/lecture7-instance-based-knn.pdf
- Slides on instance-based learning which relate it to Learning
Vector Quantisation.
www.cs.waikato.ac.nz/~ml/weka/
- The page for Weka 3: "Weka is a collection of machine learning
algorithms for data mining tasks. The algorithms can either be applied
directly to a dataset or called from your own Java code. Weka contains
tools for data pre-processing, classification, regression, clustering,
association rules, and visualization. It is also well-suited for
developing new machine learning schemes". And, Weka is open-source.
www.it.swin.edu.au/personal/jbrownlee/wekalvq/lvq.html
- Jason Brownlee's Learning Vector Quantisation, Self-Organizing
Map, and feed-forward neural network algorithms for Weka.
Mainly about negative databases for privacy
esa.ackleyshack.com/ndb/papers/neg.pdf
- Enhancing Privacy through Negative Representations of Data,
by Fernando Esponda, Stephanie Forrest and Paul Helman. Introduces
negative databases and gives batch-mode algorithms for generating
them: the prefix algorithm and another algorithm whose output the
authors prove hard to invert.
www.cs.unm.edu/~forrest/publications/negdb-unconventional-computing-05.pdf
- On-line Negative Databases, by Fernando Esponda, Elena
Ackley, Stephanie Forrest, and Paul Helman. Extends the above work
to incremental updating of negative databases.
esa.ackleyshack.com/ndb/
- The Negative Databases page, by Stephanie Forrest, Paul Helman,
Fernando Esponda, and Elena Ackley. Links to the above papers, Esponda's
dissertation, the demo page below, and related work.
esa.ackleyshack.com/ndb/demo.html
- Demonstration of negative databases, from the above site.
SAT
en.wikipedia.org/wiki/Boolean_satisfiability_problem
- Wikipedia page on Boolean satisfiability problem. Explains
the meaning of SAT, used when proving negative databases hard to
invert. See also, particularly for an explanation of "NP-hard",
the section on P vs NP in the comp.theory FAQ at www.cs.uwaterloo.ca/~alopez-o/comp-faq/faq.html,
and Wikipedia on Computational complexity theory at en.wikipedia.org/wiki/Computational_complexity_theory.
www.sciencenews.org/articles/20000506/bob9.asp
- Changes of Mathematical State by Ivars Peterson, Science
News, Volume 157, Number 19, May 6, 2000. "Untangling a web
of conflicting demands can be tough on computers": a popular-science
explanation of SAT, and of phase changes from unsatisfiable to satisfiable
and their connection with spin glasses.
Other
www1.cs.columbia.edu/~locasto/projects/candidacy/papers/kim99human.pdf
- The Human Immune System and Network Intrusion Detection,
by Jungwon Kim and Peter Bentley. An early look by the authors at
the immune system in general - not just negative selection - and
those features we might imitate for intrusion detection. They conclude
that since the human immune system is distributed, self-organising
and lightweight, it clearly fulfils the design goals for network-based
intrusion detection systems.
www.cs.nott.ac.uk/~uxa/papers/04icaris_ids_review.pdf
- Immune System Approaches to Intrusion Detection - A Review,
by Uwe Aickelin, Julie Greensmith, and Jamie Twycross, 2004. This
survey covers emulations of many immunological principles I've not
mentioned. Includes a table which classifies the systems surveyed
in terms of the principles used.
users.aber.ac.uk/smg/Publications/030605Paper.pdf
- Improved Pattern Recognition with Artificial Clonal Selection,
by Jennifer White and Simon Garrett. How CLONALG works and how it
performed in pattern recognition.
www-course.cs.york.ac.uk/nsc/applets/AISApplet/index.html
- Peter Worth's applets for demonstrating negative selection, positive
selection, and clonal selection. The Java source code can be downloaded.
www.scripps.edu/newsandviews/e_20040503/nemazee.html
- A Selection of Editing, by Jason Socrates Bardi, Scripps
Research Institute, 2004. About immunologist David Nemazee's theory
of "receptor editing": that self-matching immune cells have their
receptors tweaked to match non-self.
pensive-pondering.blogspot.com/
- Jason Brownlee's Pensive Pondering blog. An excellent resource
for, and explanation of, much AIS work.
pensive-pondering.blogspot.com/2005/08/negative-selection-modelling-in.html
- Brownlee's Negative Selection - Modelling in the complement
space.
pensive-pondering.blogspot.com/2006/01/problems-in-field-of-artificial-immune.html
- Brownlee's Problems in the field of Artificial Immune Systems.
www.goertzel.org/books/mind/chapter_two.html
- The New Immunology, from The Evolving Mind, Ben
Goertzel, 1993. A short account of how memory for antigens can be
embodied in the dynamics of an immune network system.
pensive-pondering.blogspot.com/2006/01/idiotypic-network-theory-immune.html
- Brownlee's Idiotypic Network Theory (the immune network model).
Useful links should you want to follow up the above.
pensive-pondering.blogspot.com/2006/01/biological-immune-system-useful.html
- Brownlee's The Biological Immune System - Useful Resources.
Start here for links to information about immunology and the workings
of natural immune systems.
www.talkdesign.org/faqs/Evolving_Immunity.html
- For those who like to know the history of things, this essay by
Matt Inlay is an interesting look at how the immune system evolved.
It's actually Inlay's response to a chapter in the book Darwin's
Black Box by Michael Behe, a believer in "intelligent design".
See also EvoWiki's entry for "Immune system", at www.evowiki.org/index.php/Immune_system.
nobelprize.org/medicine/laureates/1960/index.html
- Nobelprize.org's page on the Nobel Prize in Physiology or Medicine
for 1960, with links to the banquet speeches and Nobel lectures
by Macfarlane Burnet and Peter Medawar for their work on acquired
immunological tolerance.
www.cs.unm.edu/~steveah/imm-overview-new.pdf
- An Interpretative Introduction to the Immune System, Steven
Hofmeyr, 2000. An introduction to immunology for computer scientists
interested in AIS. Hofmeyr warns that it is not a comprehensive
overview, and certainly does not stand in for a good immunology
textbook.
www.lps.ens.fr/~weisbuch/if.ps
- Immunology for Physicists, Alan Perelson, 1995. I've not
had time to read this, but as it's by one of the big names in AIS
theory, it should be well worth reading. Note the date, however.
ocw.mit.edu/OcwWeb/Health-Sciences-and-Technology/HST-176Cellular-and-Molecular-ImmunologyFall2002/CourseHome/
- MIT's OpenCourseWare course for Health Sciences and Technology
on Cellular and Molecular Immunology, 2002. Includes detailed lecture
notes, exercises and a bibliography.
www-immuno.path.cam.ac.uk/~immuno/part1.html
- Immunology PartIB Home Page, from the Cambridge University
Pathology Department's Immunology Division. Nicely illustrated notes
on immunology by Nick Holmes, current up to 1998-2001.
www.foresight.org/Nanomedicine/Gallery/FanVoy/
- Images from the film Fantastic Voyage, at the Foresight
Nanotechnology Institute's Nanomedicine Art Gallery. A Google image
search for fantastic
voyage raquel welch will turn up more pictures.
www.everything2.com/index.pl?node_id=800558
- "Once every thousand years a little bird comes to this rock to
sharpen its beak". Postings on the origin of this tale.
1. I can't resist mentioning this one. Some immune systems
are programmed to go wrong. During the mating season of the Australian
marsupial mouse Antechinus, levels of cortisol rise in the
blood of the male. These high levels persist after mating is over,
and eventually suppress the male's immune system, causing it to
die from one or other kind of infection. This, of course, leaves
more resources for females and young. I owe this information to
Why Geese Don't Get Obese (and We Do): How Evolution's Strategies
for Survival Affect Our Everday Lives by Eric Widmaier.
2. My favourite negative image is the notorious put-down
"This paper fills a much-needed gap in the literature". Apparently,
this never actually occurred in a review. Its origins are explained
by Allyn Jackson in Chinese Acrobatics, an Old-Time Brewery,
and the "Much Needed Gap": The Life of Mathematical Reviews.
You'll find this at www.ams.org/notices/199703/comm-mr.pdf,
Notices of the AMS, Volume 44, Number 3, March 1997.
I've been collecting links to blogs: some day when I have time,
I'll publish a list. Jason Brownlee's Pensive
Pondering is one, obviously; here are two more.
"This is an experiment in the application of a blog to academic research
in machine learning and learning theory by John Langford. Exactly where this experiment
takes us and how the blog will turn out to be useful (or not) is one
of those prediction problems we so dearly love in machine learning.
There are a number of other people who have contributed posts including
Drew Bagnell, Sanjoy Dasgupta, Hal Daume, and Alex Gray". This blog is really a collection
of mini-tutorials on machine learning, and very instructive.
"Welcome to the blog of the F-Secure Security Labs, maintained by
the personnel in charge of analysing virus, phishing, spyware and
spam attacks". This isn't AI, but it's a good read - and not unrelated
to the main feature.
Life magazine once ran a two-page spread of about
40 photographs of different persons. Half of them were of college
professors, scientists, and esteemed businessmen. The other half
were criminals ranging from thieves to rapists to murderers. The
magazine feature was a fun contest for the reader to see if he could
tell the good citizens from the criminals. My wife and I tried it.
My score was about 30 percent right; her score was 100 percent right.
Did she have special insight? Yes, but not about faces. She observed
that half the photographs had the same draped background and deduced
correctly that the criminals were photographed at the same locale.
This story comes from How to Draw Caricatures by Lenn Redman.
It reminds me of the urban legend about the neural net which was trained
to spot camouflaged tanks. Trained on 100 battlefield scenes, each
of which contained either a tree with a tank hiding behind it, or
a tree with no tank behind it, the net did indeed learn to sort scenes
with tanks from scenes with no tanks. Not as a result of the tanks;
but because the images with tanks had been taken on a cloudy day while
images without tanks had been taken on a sunny day. This version of
the story appears on Neil Fraser's Neural
Network Follies, at neil.fraser.name/writing/tank/.
This is the title of an interesting essay at www.cscs.umich.edu/~crshalizi/Self-organization/soup-done/
by physicist Cosma Rohilla Shalizi on how to quantify self-organisation.
He says that it is now rather out of date, but links to a more recent
discussion. And his thoughts on how to approach the problem are
well-worth reading, such as this principle:
We need to be clever. Neat hacks can work wonders.
Newton's law of gravity would be worthless if you couldn't pretend
extended objects are concentrated at their centers of mass. Entropy
would be useless in thermodynamics if it weren't for the connection
to heat and temperature, and maybe similar relations hold in other
cases.
From Langmaker.com's neologisms compilation, and submitted by
C. Johnston, a definition for aibohphobia: 1) The fear of
palindromes. 2) The fear of mechanical dogs. With the latter sense,
the word is commonly spelt aibophobia.
Past newsletters are available at either www.ddj.com
or www.ainewsletter.com.
As ever, interesting links and ideas for future issues are very
welcome.
Until next month,
Jocelyn <popx@j-paine.org>
For questions about the www.ainewsletter.com
site, contact Dennis
Merritt
Copyright ©2005 Amzi! inc., CMP, and Jocelyn Paine. All Rights
Reserved
|