August 2003

Semantic Web

Clearly an interesting thing to do with the Internet is to create robots that can search out answers to questions. Suppose you wanted to find out who was the editor the Dr. Dobb's AI Expert Newsletter. Any human could answer that question in a minute or less by finding the DDJ Web site, clicking on newsletters and scanning down for the AI Expert description.

How would we write a program that could answer the same question? Well it wouldn't be easy. It would require using natural language understanding software to scan the document looking for words that might imply it found the editor, assuming it was able to figure out which page to look at in the first place.

It is a difficult program to write because the Web is designed for human use, not machine use.

Like with many programming tasks, the problem can be made much simpler with a better choice of data structures. If, in addition to free form text, a Web site had formal specifications of the content, then writing a program to answer the question becomes almost trivial.

For example, if there was some XML like this at the DDJ Web site:

<site name=ddj>
    ...
    <publication type=newsletter>
        <name>AI Expert</name>
        <description>blah blah blah</description>
        <editor>Dennis Merritt</editor>
    </publication>
    ...
</site>

then it would be easy to write a program to answer the question. Of course, it would help if all Web sites used similar XML so our program could search more than just the DDJ site.

This is exactly what the Semantic Web initiative of the W3C is working on.

Knowledge Representation and Reasoning Engines

An AI application typically has two components: knowledge representation and reasoning engine. The knowledge representation is the semantics used to describe the knowledge in the particular application domain. The reasoning engine then uses that knowledge for the desired result.

The more expressive the knowledge representation, the simpler the reasoning engine can be.

The Semantic Web is an attempt to standardize a flexible, extensible, knowledge representation for the Web. Once this is started, a whole new world of applications for the Web will be possible.

The Semantic Web is built on layered technologies.

XML - eXtensible Markup Language

XML is the base technology. XML is a more general purpose HTML where tags can be defined that define the structure and components of various types of documents. The example used to start this discussion is some made-up XML with tags of my own creation that might describe the content on a Web site.

RDF - Resource Description Framework

The earliest AI researchers found that object-attribute-value triples were a very versatile way to represent knowledge. For example:

car:color:blue
car:doors:4

This, in a nutshell, is what RDF is. Except they don't call them object-attribute-value triples, but rather subject-predicate-object triples. So in the example, car is the subject, color is the predicate and blue is the object.

RDF is very powerful, which means it's not quite as easy to read as the simple example that started this section. The newsletter description in RDF might look like:

<rdf:Description rdf:ID="AIX Newsletter"">
    <example:title>AI Expert</example:title>
    <example:description>blah blah blah</example:description>
    <example:editor>Dennis Merritt</example:editor>
</rdf:Description>

The rdf:ID refers to the subject. In this case it would be an anchor on the DDJ Web site named "AIX Newsletter". There are three subject-predicate-object triples associated with that subject. The predicates are title, description, and editor, with the object value being enclosed in each of their tags.

What does the "example:" part of the syntax refer to? Common definitions of predicates that add universality to RDF.

RDF Schemas

While the predicates in RDF can be whimsical creations of your own design, that renders them relatively useless to anyone but yourself.

RDF provides a means for organizations to create libraries of predicate definitions that can then be used by anyone with information to catalog that could make good use of those definitions. These are often called "metadata".

The Dublin Core is one such set of definitions that is similar to the properties (predicates) used in library card catalogs. We might have used them for the newsletter RDF, in which case we would use "dc:" instead of "example:". We would also have provided some additional RDF syntax that indicated we were using the Dublin Core schema and a link to it.

But schemas and RDF only go so far.

Web Ontology Language (OWL)

It is often necessary to describe the relationships between different predicates, as well as the behavior of a given predicate. (See the June issue for more on ontologies.) Documenting these relationships further extends the power of reasoning software that will use the Semantic Web.

For example, a manufacturing RDF Schema might include the predicate isPartOf. We couldn't make full use of that predicate unless we knew that if X isPartOf Y and Y isPartOf Z then X isPartOf Z. In other words, isPartOf is transitive.

OWL provides the means for adding these higher level semantic descriptions of relationships. Armed with this knowledge, an application could then answer bill of material type questions for our manufacturing site.

RDF Tools

Typing RDF/OWL is tedious business, so a number of tools are being developed to make the creation and editing of RDF/OWL documents easier. See the links for details.

Foundations in Logic

The concepts of RDF and OWL come directly from logic. One can see in the relations/predicates the same roots that led to relational database and to logic programming langauges.

The mappings serve it well, as RDF has the potential to be the glue between data on Web sites and in relational databases stored at those sites and logic programming languages used to create intelligent Web robots.

RDF in Action

These examples come from the RDF primer on the W3C site.

Dublin Core Initiative - Definitions of terms about documents, such as author, publisher, etc. This is a replication of the categories used in a library card catalog for deployment on the Web. Documents using the Dublin Core metadata can be searched automatically just as a human would use a card catalog in a library.

PRISM: Publishing Requirements for Industry Standard Metadata - Metadata that builds on the Dublin Core and is defined by the publishing industry to serve their needs. For example it has terms to define the rights associated with a publication that can then be used to automatically search for the rights associated with a given published item. Magazines are using PRISM to document an article as soon as it is published.

RSS: RDF Site Summary - Metadata used to describe news for a news feed. It allows the definition of a site as a "channel" and the latest news items from that channel. Each item has properties like title, description, link and date. A news service can then to go various channels, pick up the latest news items and then redisplay them or use them to answer search queries from their users. This is probably the most widely used RDF application on the Web.

CIM/XML - The Common Information Model (CIM) specifies semantics for power system resources. CIM/XML uses RDF Schema and RDF to describe those semantics and has been adopted as the standard for communication of technical information betwen power transmission system operators.

Gene Ontology Consortium - Created metadata for describing gene products to aid in the distribution and exchange of medical information.

Composite Capabilities/Preferences Profile (CC/PP) - Metadata for the description of components and attributes that can be used dynamically to allow the restructuring of HTML data for a particular device or browser.

Conferences

The 5th IFAC/CIGR Workshop on Artificial Intelligence in Agriculture
will be held in Cairo on March 8-10 2004. Deadline for submitting
extended abstract is Sept. 30,2003. More Information can be found at
www.claes.sci.eg/aia04

Links

http://www.w3c.org/2001/sw/ - The W3C page describing work on the Semantic Web.

http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2 - An excellent Scientific American article by Tim Berners-Lee, James Hendler and Ora Lassila, describing the Semantic Web

http://www.xml.com/pub/a/2001/01/24/rdf.html - Tim Bray's overview of RDF.

http://www.w3.org/TR/rdf-primer/ - A more technical primer for RDF that provides a good introduction to the syntax and meaning of RDF statements and their expression in XML.

http://owl.mindswap.org/ - The first Semantic Web site?

http://www.cs.umd.edu/projects/plus/SHOE/index.html - Simple HTML Ontology Extensions (SHOE) is a precursor to RDF and OWL, and is easier to understand. The examples in the SHOE tutorial on this Web site make it clear how the Semantic Web will work.

http://www.w3.org/TR/owl-features/ - An overview of OWL, an ontology built on top of RDF.

http://www.w3c.org/RDF/#developers - Resources for developers, listing a number of tools for working with RDF.

http://www.swi-prolog.org/packages/semweb.html - Prolog is a natural language for working with RDF and OWL and the developers of SWI-Prolog have created a tool kit for using RDF and OWL as well as tools for creating and editing RDF and OWL. These are part of SWI's Semantic Web Library.