AI Expert Newsletter
AI - The art and science of making computers do interesting
things that are not in their nature. August 2003
Semantic Web
Clearly an interesting thing to do with the Internet is to create
robots that can search out answers to questions. Suppose you wanted
to find out who was the editor the Dr. Dobb's AI Expert Newsletter.
Any human could answer that question in a minute or less by finding
the DDJ Web site, clicking on newsletters and scanning down for
the AI Expert description.
How would we write a program that could answer the same question?
Well it wouldn't be easy. It would require using natural language
understanding software to scan the document looking for words that
might imply it found the editor, assuming it was able to figure
out which page to look at in the first place.
It is a difficult program to write because the Web is designed
for human use, not machine use.
Like with many programming tasks, the problem can be made much
simpler with a better choice of data structures. If, in addition
to free form text, a Web site had formal specifications of the content,
then writing a program to answer the question becomes almost trivial.
For example, if there was some XML like this at the DDJ Web site:
<site name=ddj>
...
<publication type=newsletter>
<name>AI Expert</name>
<description>blah blah blah</description>
<editor>Dennis Merritt</editor>
</publication>
...
</site>
then it would be easy to write a program to answer the question.
Of course, it would help if all Web sites used similar XML so our
program could search more than just the DDJ site.
This is exactly what the Semantic Web initiative of the W3C is
working on.
Knowledge Representation and Reasoning Engines
An AI application typically has two components: knowledge representation
and reasoning engine. The knowledge representation is the semantics
used to describe the knowledge in the particular application domain.
The reasoning engine then uses that knowledge for the desired result.
The more expressive the knowledge representation, the simpler the
reasoning engine can be.
The Semantic Web is an attempt to standardize a flexible, extensible,
knowledge representation for the Web. Once this is started, a whole
new world of applications for the Web will be possible.
The Semantic Web is built on layered technologies.
XML - eXtensible Markup Language
XML is the base technology. XML is a more general purpose HTML
where tags can be defined that define the structure and components
of various types of documents. The example used to start this discussion
is some made-up XML with tags of my own creation that might describe
the content on a Web site.
RDF - Resource Description Framework
The earliest AI researchers found that object-attribute-value triples
were a very versatile way to represent knowledge. For example:
car:color:blue
car:doors:4
This, in a nutshell, is what RDF is. Except they don't call them
object-attribute-value triples, but rather subject-predicate-object
triples. So in the example, car is the subject, color is the predicate
and blue is the object.
RDF is very powerful, which means it's not quite as easy to read
as the simple example that started this section. The newsletter
description in RDF might look like:
<rdf:Description rdf:ID="AIX Newsletter"">
<example:title>AI Expert</example:title>
<example:description>blah blah blah</example:description>
<example:editor>Dennis Merritt</example:editor>
</rdf:Description>
The rdf:ID refers to the subject. In this case it would be an anchor
on the DDJ Web site named "AIX Newsletter". There are
three subject-predicate-object triples associated with that subject.
The predicates are title, description, and editor, with the object
value being enclosed in each of their tags.
What does the "example:" part of the syntax refer to?
Common definitions of predicates that add universality to RDF.
RDF Schemas
While the predicates in RDF can be whimsical creations of your
own design, that renders them relatively useless to anyone but yourself.
RDF provides a means for organizations to create libraries of predicate
definitions that can then be used by anyone with information to
catalog that could make good use of those definitions. These are
often called "metadata".
The Dublin Core is one such set of definitions that is similar
to the properties (predicates) used in library card catalogs. We
might have used them for the newsletter RDF, in which case we would
use "dc:" instead of "example:". We would also
have provided some additional RDF syntax that indicated we were
using the Dublin Core schema and a link to it.
But schemas and RDF only go so far.
Web Ontology Language (OWL)
It is often necessary to describe the relationships between different
predicates, as well as the behavior of a given predicate. (See the
June issue for more on ontologies.) Documenting these relationships
further extends the power of reasoning software that will use the
Semantic Web.
For example, a manufacturing RDF Schema might include the predicate
isPartOf. We couldn't make full use of that predicate unless we
knew that if X isPartOf Y and Y isPartOf Z then X isPartOf Z. In
other words, isPartOf is transitive.
OWL provides the means for adding these higher level semantic descriptions
of relationships. Armed with this knowledge, an application could
then answer bill of material type questions for our manufacturing
site.
RDF Tools
Typing RDF/OWL is tedious business, so a number of tools are being
developed to make the creation and editing of RDF/OWL documents
easier. See the links for details.
Foundations in Logic
The concepts of RDF and OWL come directly from logic. One can see
in the relations/predicates the same roots that led to relational
database and to logic programming langauges.
The mappings serve it well, as RDF has the potential to be the
glue between data on Web sites and in relational databases stored
at those sites and logic programming languages used to create intelligent
Web robots.
RDF in Action
These examples come from the RDF primer on the W3C site.
Dublin Core Initiative - Definitions of terms about documents,
such as author, publisher, etc. This is a replication of the categories
used in a library card catalog for deployment on the Web. Documents
using the Dublin Core metadata can be searched automatically just
as a human would use a card catalog in a library.
PRISM: Publishing Requirements for Industry Standard Metadata
- Metadata that builds on the Dublin Core and is defined by the
publishing industry to serve their needs. For example it has terms
to define the rights associated with a publication that can then
be used to automatically search for the rights associated with a
given published item. Magazines are using PRISM to document an article
as soon as it is published.
RSS: RDF Site Summary - Metadata used to describe news for
a news feed. It allows the definition of a site as a "channel"
and the latest news items from that channel. Each item has properties
like title, description, link and date. A news service can then
to go various channels, pick up the latest news items and then redisplay
them or use them to answer search queries from their users. This
is probably the most widely used RDF application on the Web.
CIM/XML - The Common Information Model (CIM) specifies semantics
for power system resources. CIM/XML uses RDF Schema and RDF to describe
those semantics and has been adopted as the standard for communication
of technical information betwen power transmission system operators.
Gene Ontology Consortium - Created metadata for describing
gene products to aid in the distribution and exchange of medical
information.
Composite Capabilities/Preferences Profile (CC/PP) - Metadata
for the description of components and attributes that can be used
dynamically to allow the restructuring of HTML data for a particular
device or browser.
Conferences
The 5th IFAC/CIGR Workshop on Artificial Intelligence in Agriculture
will be held in Cairo on March 8-10 2004. Deadline for submitting
extended abstract is Sept. 30,2003. More Information can be found
at
www.claes.sci.eg/aia04
Links
http://www.w3c.org/2001/sw/
- The W3C page describing work on the Semantic Web.
http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2
- An excellent Scientific American article by Tim Berners-Lee, James
Hendler and Ora Lassila, describing the Semantic Web
http://www.xml.com/pub/a/2001/01/24/rdf.html
- Tim Bray's overview of RDF.
http://www.w3.org/TR/rdf-primer/
- A more technical primer for RDF that provides a good introduction
to the syntax and meaning of RDF statements and their expression
in XML.
http://owl.mindswap.org/
- The first Semantic Web site?
http://www.cs.umd.edu/projects/plus/SHOE/index.html
- Simple HTML Ontology Extensions (SHOE) is a precursor to RDF and
OWL, and is easier to understand. The examples in the SHOE tutorial
on this Web site make it clear how the Semantic Web will work.
http://www.w3.org/TR/owl-features/
- An overview of OWL, an ontology built on top of RDF.
http://www.w3c.org/RDF/#developers
- Resources for developers, listing a number of tools for working
with RDF.
http://www.swi-prolog.org/packages/semweb.html
- Prolog is a natural language for working with RDF and OWL
and the developers of SWI-Prolog have created a tool kit for using
RDF and OWL as well as tools for creating and editing RDF and OWL.
These are part of SWI's Semantic Web Library.
|