AI Expert Newsletter
AI - The art and science of making computers do interesting
things that are not in their nature.
April 2003
As always, feedback is welcome. dennis@ddj.com
Reasoning about Uncertainty, Bayesian
Belief Networks
Certainty Factors
The first expert systems were research efforts in the medical diagnostic
domain. The researchers quickly found out that dealing with, and
propagating, uncertainty was a critical issue for medical systems.
Typically medical experts hedged their opinions with words like
"likely" and "may".
The researchers invented the concept of certainty factors (CFs)
to represent this uncertainty. These were numbers between 0 and
100. A given bit of data had a CF associated with it. For example
the symptom "patient has a cough" might have a CF of 70
if its not clear how bad the patient's cough is.
Rules would also have CFs, so the rule that "if a patient
has a cough then the patient has a cold" might also have a
CF, say 60. The CF of the symptom is then mathematically combined
with the CF of the rule to come up with a CF for the final diagnosis.
An diagnostic expert system could then, given a set of symptoms,
generate a number of possible diagnoses and the CF associated with
each.
(Note that uncertainty is only critical for certain types of applications.
The expert system in a doctor's office that helps fill out insurance
forms needs to say exactly what form to fill out, not provide a
menu of probable choices ranked by possibility.)
The formula for propagating CFs was developed by "feel",
tweaked until the medical experts were comfortable with the results.
Many expert system tools provide CFs as part of their knowledge
representation and reasoning engine.
Bayesian Belief Networks
It wasn't long before more rigorous types pointed out that probability
theory already had all the tools necessary for propagating belief,
in Bayesian theory. And it was mathematically precise, not like
the ad hoc CFs.
Thrown on the defensive, the CF creators argued that there is so
much uncertainty in the uncertainty estimates in an expert system
that it hardly matters if they are propagated with mathematical
precision. If a doctor estimates that a cough implies a cold with
certainty 60, that could really be anywhere from 40 to 80.
Moving to the offensive, the CF creators noted that it was much
more complex to use probability theory.
Complexity has never slowed the research community down, and a
wealth of work has been done on Bayesian belief networks (BBNs).
These use rigorous mathematics to propagate belief (probabilities)
between connected nodes in a network.
Using a BBN, the cough/cold example would be represented: P(cough,
0.7), P(cold | cough, 0.6). The first is just the probability the
symptom is a cough. The second is the conditional probability that
the disease is a cold given the symptom cough.
But, it would also be necessary to include P(not cough, 0.3) and
P(cold | not cough, 0.2) for the formulas to be used to accurately
determine P(cold). That is because the probability of the patient
having a cold is a combination of the probability given a cough
as a symptom and the probability of having a cold if not cough is
a symptom.
And if the node describing the conditional probability of a cold,
given a cough, were to be connected to other nodes, then we would
also need to specify P(not cold|cough) and P(not cold|not cough).
It is this requirement for a matrix of conditional probabilities
that adds overhead to the development of BBNs.
But, the potential of a BBN is fantastic. All the factors in a
particular problem domain can be connected together in dependency
networks, with the conditional probabilities at each node specified
as above. Then, as evidence about a situation is gathered, that
evidence, which itself is expressed as a probability, can be propagated
through the entire belief network, updating the beliefs of all the
other nodes.
This means propagating both backwards and forwards, which is where
the real power of Bayes' theorem comes to bear. It states that one
can compute the P(A|B) from P(B|A) by the formula:
P(A|B) = P(B|A) * P(A) / P(B).
This equation has another very important implication. It overcomes
the CF creator's objection that estimates of uncertainty are themselves
extremely uncertain. Consider the cough/cold example.
It is difficult to determine the probability that someone has a
cold given they have a cough. A doctor will go by feel, indicating
a cold is likely if the patient has a cough.
But, it is easy to measure exactly how many patients that visit
the doctor have colds, how many have the symptoms of a cough, and,
most important, how many who have colds have a cough.
That is, whereas determining P(cold|cough) is hard, we can use
statistics to exactly determine P(cough|cold).
Armed with Bayes theorem, and P(cold) and P(cough) in the doctor's
office, we can then precisely compute P(cold | cough) given P(cough
| cold).
P(cold | cough) = P(cough | cold) * P(cold) / P(cough).
It is no longer necessary to estimate what "likely" means.
One can imagine great networks of linked conditional probabilities
used to model different application domains, that react to evidence,
propagating beliefs, and that is exactly what people do with Bayesian
belief networks (BBNs).
There are many deployed applications using BBNs, and there are
a number of commercial products available as well.
But there are technical problems, the first being that propagation
is computationally too hard given a significant number of nodes
-- unless you make assumptions and limitations as to the type of
network you build.
The second is that it is difficult to get all of the probability
data for the nodes. As the CF creators pointed out, what good is
it if the data is wrong?
Researchers are working on both these areas, with learning algorithms
for developing BBNs from statistical data, and better algorithms
for propagating beliefs.
See the links below to learn more
about what can and has been done with BBNs, and where the current
research is going.
Fuzzy Logic
Fuzzy logic remains as a topic for a later date, but I'll briefly
mention it here for completeness.
Fuzzy logic is another approach to reasoning with uncertainty,
in which variables in a system, instead of having probabilities,
have a degree of truth associated with them. So rather than being
true (1.0) or false (0.0) a fuzzy variable might be "most likely
true" (0.8). Like with both probability theory and CFs, there
are rules for combining and propagating fuzzy truth values.
One can immediately see an advantage of fuzzy logic over probability
in the cough/cold situation. It's not really that the patient has
a probability of a cough, but rather a degree of cough, from mild
to severe. This is naturally represented as a fuzzy truth value.
But one can also immediately see the disadvantage, because the
result of a diagnostic system should be the probability that the
patient has a cold, not a degree of cold.
If it were simple, it wouldn't be AI.
Code Corner - Probability and Bayes
Prolog is a popular language for AI for a very simple reason. It
has built in pattern-matching (unification) and search (backtracking)
algorithms. And much of AI is exactly about pattern-matching and
search -- searching for patterns in if-then rules for an inference
engine; searching for patterns and moves in game playing; searching
for patterns in vision and natural language application; etc. etc.
Because the programmer doesn't have to code the pattern-matching
and search in a Prolog application, the code is often very concise
and easy to read. Here's an example from some simple experiments
with Bayes theorem and probability, where a few lines of code can
be used to answer queries about the probability of various events.
A simple example relating cancer and smoking, similar to the cold/cough
analysis above, came from Norman Fenton's
tutorial. We can write a small Prolog program that lets us play
with that data and Bayes theorem.
First we represent the basic probability facts as Prolog facts:
p(patient(cancer), 0.1).
p(patient(smoker), 0.5).
We then represent the conditional probabilities, using a ^ operator:
cp(patient(smoker) ^ patient(cancer), 0.8).
Next we write query rules that can be used to find various probabilities.
There are three rules, covering the case where the probability is
known, the conditional probability is known, or we can compute the
conditional probability using Bayes theorem. Words beginning with
upper case letters are variables in Prolog. Note that the third
rule recursively calls itself to get the individual probabilities.
getp(A,P) :-
p(A,P), !.
getp(A^B, P) :-
cp(A^B, P), !.
getp(A^B, P) :-
cp(B^A, Pba),
getp(A, Pa),
getp(B, Pb),
P is Pba * Pa / Pb.
We now have a simple system for representing and querying knowledge
expressed as probabilities. We can consult these facts and rules
into a Prolog listener and try it:
?- getp(patient(cancer), P).
P = 0.1
?- getp(patient(smoker)^patient(cancer), P).
P = 0.8
?- getp(patient(cancer)^patient(smoker), P).
P = 0.16
One of the fun things about reading probability papers is they
have Rube
Goldberg like examples. Here's a more complex set of a chain
of facts, probably relating to the author's vacation. This one comes
from an online lecture available from Temple
University.
p(bonus, 0.6).
cp(money ^ bonus, 0.8).
cp(money ^ not bonus, 0.3).
cp(hawaii ^ money, 0.7).
cp(Hawaii ^ not money, 0.1).
cp(san_francisco ^ money, 0.2).
cp(san_francisco ^ not money, 0.5).
cp(wierd_people ^ san_francisco, 0.95).
cp(wierd_people ^ not san_francisco, 0.6).
cp(surfing ^ Hawaii, 0.75).
cp(surfing ^ not Hawaii, 0.2).
Note that in this case, except for bonus, we don't always have
a direct probability that we can use to get P(A) when needed. There
is another formula that can be used to compute P(A) from P(A|B)
when needed.
P(A) = sum( P(A|B) * P(B) )
So in the example, if we wanted to compute P(surfing), it depends
on P(Hawaii) which depends on P(money) which depends on P(bonus).
We can expand our query rules to include linked probabilities like
this. Prolog's recursion handles this nicely. (Prolog's findall
finds all the instances and stores them in a list; we add a sum
rule to add the elements in the list.)
We also need a rule to calculate not P, which probability theory
says is 1 - P.
getp(not A, P) :-
getp(A, Pn),
P is 1 - Pn, !.
getp(A, P) :-
findall(P, (
cp(A ^ B, P1),
getp(B, P2),
P is P1*P2
), L),
sum(L, P).
sum(L, Sum) :-
sum(L, 0, Sum).
sum([], Sum, Sum).
sum([A|Z], Acc, Sum) :-
Acc2 is Acc + A,
!, sum(Z, Acc2, Sum).
We can try this in a listener:
?- getp(surfing, P).
P = 0.453
We can also change the probability of the bonus, to see how that
affects the probability of surfing.
?- retract(p(bonus,_)).
?- assert(p(bonus,0.3)).
?- getp(surfing, P).
P = 0.4035
That let's us see top down affects. What about rippling Bayes theorem
back up to see what the fact that someone went surfing means for
the probability they had a bonus? That's harder and requires propagating
beliefs through the network. But its not too hard for graphs without
loops, and there is a relatively straightforward algorithm for doing
it that was originally published by Judea Pearl in 1988 in a book
called Probabilistic Reasoning in Intelligent Systems, published
by Morgan Kaufman.
There is a concise explanation of the algorithm in Facundo
Bromberg's final project report, which describes a Java implementation
of the algorithm. The Temple University tutorial
points to a Prolog program which does the same. The Prolog code
is relatively obtuse unless you have the paper version of the algorithm
to study as well.
Work in AI is all about knowledge representation and reasoning
engines. Those two aspects are clear in this little example. The
p and cp facts used above are a knowledge representation
language for a probability network. The getp rules are a
reasoning engine that uses that knowledge representation.
Links
The Best of AI Expert
The June 1994 issue had three excellent articles on neural networks.
Object-Oriented Neural Networks [Jun 1994] - Ivo Vondrak
provides a clean object-oriented design for implementing your own
neural network application.
Tuning Neural Networks with Genetic Algorithms [Jun 1994]
- Kurzweil has suggested the way to build an artificial brain is
to have neural networks tuned with genetic algorithms, and some
of the leading edge game-playing software is using the same approach.
Dan Murray provides details of the approach in this article.
Activating Neural Networks: Part I [Jun 1994] - Anthony
Kempka looks into the details of activation functions, the little
equations that decide when a neuron fires. He discusses a variety
of functions and their impact on the behavior of a neural net, but
also points out that the use of backpropagation for learning severely
restricts the types of functions that can be used.
AI in Medicine
http://www.coiera.com/aimd.htm
- This is a chapter from Enrico Coiera's book, "Medical Informatics,
The Internet and Telemedicine". It provides a good introduction
to the topic of AI in medicine.
http://www.elsevier.nl/locate/artmed
- Elsevier, which published a number of learned journals, has one
called Artificial Intelligence in Medicine. This is its home page.
Quantum Computing
http://tph.tuwien.ac.at/~oemer/qcl.html
Bernhard Ömer is working on Quantum Computer Language (QCL)
which is aimed at overcoming some of the notational and usability
issues that have confused those, including this editor, attempting
to wrap their brains around programming with bits that are 0 and
1 at the same time. He has software that can be downloaded and used,
although I suspect it just simulates the quantum computer, so it
can be run on Newtonian machines.
Bayesian Belief Networks
http://www.dcs.qmul.ac.uk/~norman/SERENE_Help/start.htm
- Norman Fenton has included an excellent, not too technical, tutorial
on probability and Bayesian belief networks (BBNs) in the documentation
on SERENE, which is a safety and risk evaluation system that uses
a BBN. There is a fascinating section on the difficulties people
have in grasping realistic probabilities. The tutorial includes
a good section on other resources for BBN information.
http://yoda.cis.temple.edu:8080/UGAIWWW/lectures/bnets.html
- Temple University has a number of interesting lectures available
online. This is one on Bayesian belief networks and includes working
Prolog source code for belief propagation in networks where there
is only one path between nodes.
http://www.public.iastate.edu/~bromberg/cs572/finalproject/Report.pdf
- Facundo Bromberg's AI Final Project Report has a clear, concise
summary of the algorithm for belief propagation in networks without
loops, or directed acyclic graphs (DAGs), which was originally published
in Judea Pearl's 1988 book, Probabilistic Reasoning in Intelligent
Systems, published by Morgan Kauffman.
http://www.auai.org/ - Home
page for Association for Uncertainty in Artificial Intelligence,
lots of good links and pointers.
http://www.ai.mit.edu/~murphyk/Bayes/bayes.html
- Kevin Murphy has written an excellent in depth look at belief
networks that doesn't shy away from equations and the difficult
issues. It also has numerous links to other resources on BBNs.
|