June 2003

Feedback

One reader asked if Prolog was psychic and could tell that Price and PriceCents were the same in this example. Unfortunately psychic versions are only available at advanced wizardry research institutions. Normal programmers have to spell all their variables the same.

price(StartHour, DurationMinutes, PriceCents) :-
    StartHour < 7,
    StartHour > 16,
    Price is Duration * 10.

(The error has been fixed in the archive version, plus the additional error that 8pm is 20:00 not 16:00. ed)

Sigh. I shouldn't edit the code for clarity when it's in the newsletter. I think I introduced a bug into the previous month's Bayesian example the same way. From now on all code will be cut and pasted directly from working programs and NOT editted again. I promise.

Also there was interest in the Rubik's cube program. http://www.amzi.com/articles/rubik.htm is a link to a reprint of a PCAI article describing it.

As the article describes, the program is NOT a learning program. It has the strategy for solving the cube hard coded in the program. So it is an interesting example of knowledge engineering, not learning.

I believe there are learning versions of Rubik's Cube programs around, which an Internet search might turn up. Although, they might only work for 2x2 cubes.

Ontologies

Early on AI researchers realized that a big problem with building "intelligent" systems was that computers lacked common sense knowledge of the world.

Consider, for example, trying to write a natural language understanding program with these two sentences:

It was a canary. The beak was injured.

It's one thing to be able to parse the words, but quite another to "understand" what it means. A human makes use of the knowledge that 1) a canary is a type of bird, and that 2) a beak is a part of a bird to understand this sentence.

So, in order to write a computer program that can understand those sentences, it is not enough to know grammar rules and parts of speach, it is also necessary to somehow encode the knowledge that a canary is a bird and a beak is a part of a bird.

This, in computer science terms, is an ontology.

One could argue that this is a poor word choice. If you look up ontology in the dictionary it talks about the philosophy of existence. Which makes software ontologies sound like something very heavy indeed.

But they're not. A software ontology is simply encoded knowledge about concepts and relationships. Like

canary is_a bird.
beak is_part_of bird.

Often times an expert system has two parts. One is an ontology, describing the terminology of the domain; and the other is rules that are used to reason over that domain.

For example, a technical support system might have these rules to let a user know which way to tilt slashes in a directory path:

If error = bad_path and directory_separator \= os_directory_separator then 
    tilt_slashes.
If operating_system = windows then os_directory_separator = backslash.
If operating_system = unix then os_directory_separator = forwardslash.

And be supported by this ontology:

windows is_a operating_system.
unix is_a operating_system.
W2K is_a windows.
XP is_a windows.
W98 is_a windows.
Linux is_a unix.
Solaris is_a unix.

Using the rules and ontology together, a tech support system could then have a dialog like this:

What error? bad_path
What directory_separator? forwardslash
What operating_system? W98
Recommendation: tilt_slashes

It is the ontology that lets the rules act as if they understand that W98 is a windows operating system.

Common Sense

We can see where an application might have use for a specific ontology, but some AI researchers are working in larger areas of discourse. They want to develop software that can understand any written discourse.

To do that, they needed to develop an ontology of everything.

This is a huge undertaking, and one of the giants of early AI work, Doug Lenat, set out to do exactly that. He and his colleagues have been working on the Cyc, from encyclopedia, project since 1987.

Cyc is huge repository of facts (1,000,000) about the world, such as trees usually grow outdoors and glasses should be held rightside-up. Interfaces are available for querying Cyc, so it's common sense language can be utilized in any application.

There are other efforts in similar veins. ThoughtTreasure in one with some interesting concepts of space connected by worm holes that can be used to understand discourse about, say going from the street into a restaurant. Using it as a tool, one could create an ontology about spaces, such as stores on a street, or a village, or a city, and understand discourse about those spaces.

WordNet might be called a dictionary/thesauraus++. A dictionary contains definitions, and a thesauraus contains relationships of a certain kind. WordNet expands on these concepts and stores many more types of relationships between words, inspired by psycho-linguistic studies of how humans store and retrieve words and meanings.

The WordNet documentation is full of words like meronym (part of relationships), holonym (the opposite of parts), and hyponym (hierarchical organization). Some of these are further subdivided. Meronyms come in different flavors, being components (branch/tree), members (tree/forest), and composites (airplane/aluminum). These examples are just some of the relationships for nouns. Verbs, adjectives and adverbs have their own similarly complex collection of relationships.

WordNet lets a program understand about the canary and its injured beak.

EDR is a similar project from Japan, with the added wrinkle that it has both Japanese and English lexical concepts that can be used for more accurate machine translation of documents.

LADL is one from France, with French and English, but also support for other languages.

GeoReference Online Ltd. - Mining with Ontologies

GeoReference is a company that has applied ontology technology to mineral exploration. They provide one product designed for mapping and geologic ontologies, LegendBurster, and another, MineMatch that uses those ontologies to match potential mining sites with the attributes of known successful sites.

The power of the ontology component is it allows field geologists to enter information about a location in a manner that can then be analyzed and understood by computer software. Without those ontological definitions of geologic concepts, the data could not be effectively analyzed.

The figure shows the ontology editor component of Georeference's software. You can see the various attributes and hierarchical relationships typical in many ontologies.

Links

Ontologies

http://www-ksl.stanford.edu/kst/what-is-an-ontology.html - Tom Gruber of Stanford provides a definition of ontology as used in AI, as opposed to philosophy.

http://ksl-web.stanford.edu - Stanford's Knowledge Systems Lab (KSL) is a leader in research on ontologies. This is the home page for that work. Checkout the online tools, such as Ontolingua, a tool for working with ontologies on the Web.

http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html - An excellent tutorial on how to build an ontology for a particular domain.

http://www.w3.org/TR/webont-req/ - A paper on OWL (Web Ontology Language), the W3C's work on a common ontology language for the Web.

ThoughtTreasure is a common sense knowledge base and architecture for natural language processing that has a number of concepts, including grids for describing spaces, and worm holes for connecting spaces. So one grid might define the layout of a restaurant and a worm hole models the door to the grid representing the street. These concepts can be used to "understand" articles about people in restaurants.

This is an online version of Erik Meuller's book on ThoughtTreasure.

The home page for WordNet, an online lexical reference based on psycholinguistic theories of how humans store and retrieve word meanings.

http://www.cyc.com/ - The home page for Cycorp, Doug Lenat's company pursuing ongoing development and deployment of the Cyc common sense ontology.

The home page for OpenCyc, and open source version of Cyc that is developed cooperatively with Cycorp.

The home page for EDR, a Japanese lexical ontology in both English and Japanese designed for general understanding and machine translation of texts.

The home page for LADL, a French, English, and other language linguistic database from the University of Marne-la-Vallee.

The home page for GeoReference, which describes their general purpose ontology tools and the geologic application, MineMatch, built on top of them.

A description of the ontologies used in MineMatch for geologic exploration.