3       Ontologies

Ontologies are a popular research topic in Artificial Intelligence (AI), e.g. in knowledge engineering, natural language processing, intelligent information integration and multi-agent systems. But ontologies are also applied outside AI in the World Wide Web community. OWL (see chapter 3.4.3) e.g. originates from efforts of the W3C working group that developed the idea of the Semantic Web. By attaching information to data that describe its contents and meaning, web resources can be used by machines. The data can not just be used for display purposes, but as well for automation, integration and reuse of data across various applications.

 

3.1             Definition

The notion of Ontology has a long history in philosophy. There, it is a branch of metaphysics and refers to the study of being or existence. It has strong implications for the conceptions of reality.

 

In the area of computer science, many definitions exist and are discussed. The most widespread and widely accepted one in the context of AI is the following from [Gru93]:

 

An ontology is a formal, explicit specification of a shared conceptualisation.

 

Conceptualisation refers to an abstract model of the world that should be represented. It defines entities and relationships between them in a specific domain of discourse. Both names of the entities and relationships are provided with human-readable text, describing their meaning. In other words, machine-processable content is linked with meaning for humans.

 

The conceptualisation is shared, which means that the knowledge represented by an ontology is not restricted to some individual but should be accepted by a group.

 

Formal refers to the fact that the ontology should be machine-readable, and explicit means that the concepts and relations between them have to be explicitly defined. Although data is machine-readable, it is represented in the form of semi-structured natural language text.

 

At least as important as what an ontology is, is the question what it is good for. As a shared conceptualisation, the main purpose for setting up an ontology is to enable knowledge sharing and knowledge reuse. Ontologies capture general knowledge about a domain that is changing rarely and specify concepts and relations about which knowledge is to be accumulated and processed. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualisation, explicitly or implicitly.

For information systems or for the Internet, ontologies can be used to organise keywords and database concepts by capturing the semantic relationships among the keywords or among the tables and fields in a database. E.g. Yahoo! has a hierarchical structure categorising Internet documents by using formal metadata. The semantic relationships give the user an abstract view of available information for its domain of interest [HS97].

 

3.2             Types of ontologies

A common way to classify ontologies is depending on their level of generality. [Fen04] identifies these types:

 

·        Domain ontologies contain knowledge that is valid for a particular type of domain (e.g. electronic, medical, mechanic).

·        Meta data ontologies provide a vocabulary for describing the content of information sources in the World Wide Web. The Dublin Core Metadata Initiative  e.g. develops online metadata standards supporting a broad range of purposes and business models.

·        Generic or top-level ontologies (also called common sense ontologies) capture general knowledge. The conceptualisations defined here are valid across several domains.

·        Representational ontologies provide representational entities without stating what should be represented. As a consequence, they do not commit themselves to any particular domain. A well-know representational ontology is the Frame-Ontology [Gru93]. It defines concepts such as frames, slots and slot constraints, allowing the expression of knowledge in an object-oriented or frame-based way. The ontology editor Protégé (see chapter 3.5) follows this model.

·        Method and task ontologies can provide terms for particular tasks or problem-solving methods. They are used  for reasoning on domain knowledge.

 

3.3             Ontologies and software agents

If two agents communicate about some domain, it is necessary for them to agree on the terminology that they use to describe this domain. A set of definitions of formal vocabulary, i.e. a conceptualisation, can be used to set up this consensual terminology and therefore supports in knowledge sharing among AI software, such as software agents.

 

Common ontologies are used to describe ontological commitments for a set of agents so that they can communicate about a domain of discourse without necessarily operating on a globally shared theory. An ontological commitment is an agreement to use a vocabulary. [Gru93] says that software agents commit to ontologies. Thus, their observable actions are consistent with the definitions in the ontology and they gain a shared understanding of the domain of discourse. As already mentioned, ontologies make messaging meaningful. But ontologies serve as well as prerequisites for consensus since agents can only communicate when they have already agreed on a consensual point of view on the world. Further, ontologies allow agents to ground their beliefs and actions.

 

3.4             Ontology languages

Ontologies are formal theories about a certain domain of discourse and therefore require a formal logical language to express them. Several languages that describe ontologies exist, such as CycL and KIF, Ontolingua or Frame Logic [Fen04]. Languages for defining ontologies are syntactically and semantically rich languages, e.g. richer than common approaches for databases.

In this work, domain ontologies are expressed in Web Ontology Language (OWL). The main reason to choose this format was because OWL is a W3C Recommendation since February 2004. A W3C Recommendation is understood by the industry and the community of researchers as a web standard. It is a stable specification developed by a W3C working group and reviewed by the W3C Membership. Another reason for OWL is simply a more practical reason: The ontology editor Protégé (see chapter 3.5) enables the use of a plug-in to store the ontology in OWL and the Jena Framework (see chapter 5.5) provides a parser for OWL. But, the existence of the OWL plug-in and the Jena Framework, might in turn result from OWL’s popularity as a powerful Web standard.

 

Since OWL is enhanced Resource Description Framework (RDF), the following chapter 3.4.1 introduces this language. A description of its extension RDF Schema is given in 3.4.2, before focussing on OWL in 3.4.3.

 

3.4.1      RDF

RDF is a standard form for describing machine-processable semantics of data. The syntax is defined in XML. It represents an attempt to attach additional semantics to XML by defining relationships.

Every document is represented in a tree structure. The different leaves of the tree have well-defined tags as well as contexts with which the information can be understood. These contexts represent semantics.

 

The underlying structure of any expression in RDF is a collection of triples. Each triple consists of a subject, a predicate and an object [KC04], as figure 3 shows. A set of such triples is called RDF graph.

 

Figure 3: RDF data model

 

A triple represents a statement of a relationship between the things denoted by the nodes (i.e. subjects and object) that it links. The direction of the arc is significant: it always points toward the object.

 

·        A subject (or resource) is an entity that can be referred to by a World Wide Web address, i.e. by an URL or URI. Resources are the elements that are described by RDF statements.

·        A predicate (also called property) defines a binary relation between resources and / or atomic values provided by primitive data type definitions in XML.

·        An object is a value for a subjects predicate. Therefore, it provides the actual characterisation of Web documents.

 

A simple example is:

 

capital (http://www.countries.com/Portugal) = Lisbon

 

Expressed in natural language, this states that “Lisbon is the capital of Portugal”. The sentence has the following parts:

 

Subject (Resource)

http://www.countries.com/Portugal

Predicate (Property)

capital

Object (Literal)

Lisbon

 

The RDF/XML document representing this information looks as follows:

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3c.org/1999/02/22-rdf-syntax-ns#"

         xmlns:ex="http://example.org/schema/">

<rdf:description rdf:ID="Portugal">

    <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>

    <ex:capital>Lisbon</ex:capital >

</rdf:description >

</rdf:RDF>

 

As decribed in [Bec04], any RDF graph - and any OWL ontology graph as well - can be written in many different syntactic forms. One example can be found in chapter 3.4.3.5 where owl:functionalProperty is described.

 

3.4.2      RDFS

RDFS stands for RDF Schema and is a vocabulary for describing properties and classes of RDF resources. By providing a framework, it allows the definition of classes and properties through their types. But it does not provide the actual application-specific classes and properties.

 

The following example refers to the former RDF example in 3.4.1. But here, rdfs:Class is used instead of rdf:description and the rdf:type information is dropped. Additionally, the rdfs:class country is added to demonstrate the use of rdfs:subClassOf.

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3c.org/1999/02/22-rdf-syntax-ns#"

         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

         xmlns:ex="http://example.org/schema/">

<rdfs:Class rdf:ID="country" />

<rdfs:Class rdf:ID="Portugal">

          <ex:capital>Lisbon</ex:capital>

<rdfs:subClassOf rdf:resource="#country" />

</rdfs:Class>

</rdf:RDF>

 

As this short example shows, classes in RDFS remind of classes in object-oriented programming languages. Resources can be defined as instances of classes and as subclasses of other classes. rdfs:subClassOf, which defines a relationship between two elements of class, is one of the core property types of RDFS. It is assumed to be transitive.

 

Recapitulating, the RDF infrastructure enables encoding, exchange and reuse of structured data. OWL is enhanced RDF. Its model is inferred from the RDF model but is more flexible. More vocabulary allows the description of properties and classes, and more complex kinds of relationships among the elements can be expressed. OWL was designed to extend the semantic reach of current XML and RDF metadata efforts.

 

3.4.3      OWL

OWL is a semantic markup language for defining, publishing and sharing ontologies in the World Wide Web. As part of the Semantic Web attempt, it was developed as a vocabulary extension of RDF. Thus, it is written in XML and therefore has all advantages XML provides: information can easily be exchanged between different types of computers using different types of operating system and application languages. OWL differs from RDF since it  enables greater machine interpretability of Web content by providing additional vocabulary (i.e. syntax) along with a formal semantics.

 

OWL has two important contributions:

·        Its standardized syntax for writing ontologies specifies classes, subclasses, properties and subproperties. Predefined properties are available to be used to model instance of and subclass of relationships as well as domain restrictions and range restrictions of attributes.

·        A standard set of modeling primitives like “instance of” and “subclass of” relationships. Although the relationships between entities might be more complex than in RDF, an OWL ontology uses the RDF graph data model.

 

An OWL ontology mainly contains classes, properties, instances of classes and relationships between these instances. Additionally, it defines a namespace declaration and an ontology header. Therefore, it can be used directly to publish and share sets of terms, i.e. to describe an ontology. Support includes advanced Web search, software agents and knowledge management.

 

The following subchapters show how OWL documents are structured and selectively pick out language constructs that are used in the ontologies of this work. A complete set of OWL language constructs can be found in [DS04].

 

3.4.3.1      Namespaces

Since ontologies are distinct resources, they must have identifiers making it possible to uniquely identify concepts. XML namespaces provide a method to avoid element name conflicts, using URI references.

In RDF, an URI identifies a namespace which belongs to a schema. The declarations are included in the rdf:RDF tag. The built-in vocabulary for OWL all comes from the OWL namespace http://www.w3.org/2002/07/owl#. An example is:

<rdf:RDF

    xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"

    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

    xmlns:owl="http://www.w3.org/2002/07/owl#"

    xmlns="http://www.owl-ontologies.com/unnamed.owl#"

</rdf:RDF>

 

Besides the namespaces for RDF, RDFS and OWL, this example declares a namespace for Protégé and another one (http://www.owl-ontologies.com/unnamed.owl#) that is used for all user-defined entities. It can be edited in Protégé during the ontology definition process.

 

3.4.3.2      Ontology header

The header of the document contains information about the ontology it describes. An example:

<owl:Ontology rdf:about="">

<owl:imports

rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>

<rdfs:comment>Example of Ontology Header</rdfs:comment>

</owl:Ontology>

 

The first tag <owl:Ontology rdf:about=""> states that this block describes the current ontology.

 

<owl:imports> references another OWL ontology. An URI specifies from where the ontology is imported. The meanings of the definitions of the referenced ontology are considered to be part of the importing ontology. The imported ontology in this example contains a class definition for <owl:Class rdf:ID="PAL-Constraint"/>. Thus, it can be used in the importing ontology, e.g. by instantiating this class via <protege:PAL-Constraint rdf:ID="CarAssembling_00141"/>. This concrete import is required due to the OWL plug-in used within Protégé.

 

<owl:imports> statements are transitive. I.e., if ontology A imports B, and B imports C, then A imports both B and C.

Importing an ontology into itself is regarded as a null action. If ontology A imports B and B imports A, they are considered to be equivalent.

 

3.4.3.3      Classes

By defining classes, resources with similar characteristics are combined. Every class is associated with a set of individuals (see 3.4.3.4), called the class extension. The individuals in the class extension are also called instances of the class.

 

The simplest way describing a class is through a class name, like this short example shows:

<owl:Class rdf:ID="Automobile_Part">

 

This will assert the triple "ex:Automobile_Part rdf:type owl:Class", where ex is the namespace of the relevant ontology.

 

The effective use of ontologies depends on the abilty to reason about individuals. Hence, a mechanism is necessary to describe the classes the individuals belong to and the properties they inherit. Although it is possible to assert specific properties about individuals, most of the power of ontologies is a result of class-based reasoning. Therefore, OWL provides class axioms that state additional characteristics of a class.

 

To express taxonomy, the basic construct is rdfs:subClassOf. It states that the class extension of one class description is a subset of the class extension of another class description. An example:

<owl:Class rdf:ID="Light">

    <rdfs:subClassOf rdf:resource="#Automobile_Part"/>

    <protege:abstract>true</protege:abstract>

    <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

any device serving as a source of car illumination

 </rdfs:comment>

  </owl:Class>

 

This class axiom declares a subclass relation between the two OWL classes Light and Automobile_Part. Subclass relations provide necessary conditions for belonging to a class. In this case, an individual belonging to Light also needs to be an Automobile_Part.

 

Additionaly to user-defined taxonomy, owl:Class is implicitly a subclass of the predefined class owl:Thing. This means that the class extension of owl:Thing is the set of all individuals.

 

The owl:unionOf property links a class to a list of class descriptions. The statement describes an anonymous class whose class extension contains those individuals that occur in at least one of the class extensions of the class descriptions in the list. An example:

<owl:FunctionalProperty rdf:ID="holder_nr">

    <rdfs:domain>

      <owl:Class>

        <owl:unionOf rdf:parseType="Collection">

          <owl:Class rdf:about="#Stoplight"/>

          <owl:Class rdf:about="#Low_Beam_Light"/>

          <owl:Class rdf:about="#Long_Distance_Light"/>

          <owl:Class rdf:about="#Parking_Light"/>

          <owl:Class rdf:about="#Head_Light"/>

        </owl:unionOf>

      </owl:Class>

    </rdfs:domain>

    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>

  </owl:FunctionalProperty>

 

The owl:FunctionalProperty (see also 3.4.3.5) with rdf:ID="holder_nr" belongs to each class that is listed in the set owl:unionOf.

 

3.4.3.4      Individuals

Individuals describe members of classes. They are defined with individual axioms, which are also called facts. Facts are statements indicating class membership and property values of individuals. An example:

<Wheel rdf:ID="carAssembling2_00213">

    <price_value rdf:datatype="http://www.w3.org/2001/XMLSchema#float">

89.9

 </price_value>

    <price_currency rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

EUR

 </price_currency>

    <manufacturer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

Michelin

 </manufacturer>

  </Wheel>

 

The individual named carAssembling2_00213 is an instance of the class Wheel. In this example, the inexpressively name is generated by Protégé. The facts are price_value, price_currency and manufacturer. Each one is defined with an individual value.

 

3.4.3.5      Properties

Properties are binary relations that allow the assertion of general facts about the members of classes and specific facts about individuals. OWL distinguishes between two main categories: datatype properties and object properties.

 

Datatype properties describe relations between instances of classes and RDF literals and XML Schema datatypes. In the following example, horsepower describes the relation between the class Motor and the XML datatype int. The axiom rdfs:range asserts that the values of this property must belong to data values in the specified data range, while rdfs:domain determines the class. Additional, the property can be commented using the appropriate RDFS tag.

<owl:DatatypeProperty rdf:ID="horsepower">

    <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

SAE@rpm

 </rdfs:comment>

    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>

    <rdfs:domain rdf:resource="#Motor"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>

  </owl:DatatypeProperty>

 

 

The second category are object properties. They are relations between instances of two classes, i.e. they link individuals to individuals. An example is:

<owl:ObjectProperty rdf:ID="has_oil_sump">

    <rdfs:range rdf:resource="#Oil_Sump"/>

    <rdf:type

          rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>

    <rdfs:domain rdf:resource="#Motor"/>

</owl:ObjectProperty>

 

The first line declares the property. rdfs:range and rdfs:domain are built-in properties. In the case of an object property, the former syntactically links a property to a class description, in this case Oil_Sump. The axiom asserts that the values of this property must belong to the class extension of the class description. rdfs:domain syntactically links a property to a class description. Expressed in natural language, we would say “a motor has an oil sump”. An rdfs:domain axiom asserts that the subjects of such property statements must belong to the class extension of the indicated class description.

 

Another syntactic variation, semantically equivalent to the example above, is the following one. It uses the tag owl:FunctionalProperty instead of including this information in the rdf:type tag. The built-in class owl:FunctionalProperty is a special subclass of the RDF class rdf:Property.

<owl:ObjectProperty rdf:ID="has_oil_sump">

    <rdfs:range rdf:resource="#Oil_Sump"/>

    <rdfs:domain rdf:resource="#Motor"/>

</owl:ObjectProperty>

<owl:FunctionalProperty rdf:about="has_oil_sump"/>

 

A functional property is a property that can have only one value y for each instance x. I.e. there cannot be two distinct values y1 and y2 such that the pairs (x, y1) and (x, y2) are both instances of this property. Carried forward to this example, which states that the has_oil_sump property is functional, it means that “one oil sump can only belong to one motor”.

Both object properties and datatype properties can be declared as functional.

 

3.4.3.6      Dataypes

OWL allows three types of data range specifications. First of all, OWL uses the RDF datatyping scheme, which in turn is derived from the XML datatyping scheme. Secondly, it allows the use the RDFS class rdfs:Literal, which is the class of literal values such as strings and integers, and thirdly, the enumerated datatype can be used.

 

The enumerated datatype uses the owl:oneOf construct, which defines a range of data values. It is not only used in connection with properties like in the following example but as well for describing enumerated classes.

<owl:FunctionalProperty rdf:ID="fuel">

    <rdfs:range>

      <owl:DataRange>

        <owl:oneOf rdf:parseType="Resource">

          <rdf:first rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

diesel

 </rdf:first>

          <rdf:rest rdf:parseType="Resource">

            <rdf:first rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

gasoline_unleaded

   </rdf:first>

            <rdf:rest rdf:parseType="Resource">

<rdf:first rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

super_unleaded

</rdf:first>

<rdf:rest

rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>

            </rdf:rest>

          </rdf:rest>

        </owl:oneOf>

      </owl:DataRange>

    </rdfs:range>

    <rdfs:domain rdf:resource="#Engine"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>

  </owl:FunctionalProperty>

 

In the case of an enumerated datatype, the subject of owl:oneOf is a blank node of class owl:DataRange and the object is a list of literals. RDF requires this collection to be a list of RDF node elements. In other words, the list of data values is a nested construction and has to be filled with the basic list constructs rdf:first, rdf:rest and rdf:nil, as the example clarifies.

 

3.5             Protégé

All ontologies in this thesis have been developed with the aid of Protégé, latest version 2.1.2.

 

The Protégé tool is an open-source, Java-based knowledge-modelling platform. It was developed by the Stanford Medical Informatics (SMI) group at Stanford University and is freely available. The core of the application is the ontology editor with its graphical user interface (GUI), which allows the user to construct a domain ontology, to customise the automatically generated data entry forms and to enter data. External standalone Java applications can be developed using Protégé’s API in order to build and access domain models. One of Protégé’s major advantages is its extendibility: the architecture allows the development and integration of plug-ins.

 

Plug-ins are additional modules that extend the Protégé system’s core. They usually perform functions not provided by the standard distribution. Most existing plug-ins developed by both Stanford University directly and external users are available in the Protégé Contributions Library. In this work, the following modules are used:

 

·        The OWL plug-in is a backend plug-in (also called storage plug-in). Backend plug-ins enable the user to export and import ontologies in different formats. As the name suggests, the OWL plug-in allows the user to load and store an ontology in OWL format. But it allows as well to edit and visualize OWL classes and their properties, define logical class characteristics as OWL expressions, execute reasoners and to edit OWL individuals for Semantic Web markup.

 

·        The BeanGenerator plug-in is a so-called tab plug-in since it is embedded in the editor’s GUI. It maps objects in the Protégé model to the corresponding Java classes. These automatically generated Java classes fulfil with the JADE specifications in [Cai02]. Like in this work, intelligent software agents can profit from this mechanism since the resulting Java source files can be accessed easily from any Java program.

 

·        OntoViz is another Tab plug-in, providing a convenient graphical visualisation of ontology models. It was used during the development process. Figures such as 11 and 12 show examples of these automatically-generated visualisations.

 

 

The Protégé knowledge model is based on frames and first order logic. It is Open Knowledge Base Connectivity (OKBC) compliant. OKBC is an API that represents a standard to access knowledge bases and therefore allows platform and language-independent knowledge-level communication. It has been developed to address the problem of knowledge base tools reusability [CFFKR98]. OKBC determines the main components that make up an ontology: classes, slots, facets and instances.

 

Classes are named concepts of a domain. They define attributes and relations and are organised in a class hierarchy. The class taxonomy is represented in a tree structure. Since multiple inheritance is permitted, one class may have several super-classes. Classes can be concrete or abstract. In contrast to abstract classes, concrete classes may have direct instances.

Slots make up the attributes of classes and their relationships to other classes. They are global to the ontology, i.e. the name is unique in the ontology. Each slot has a name and a value type. Protégé supports the basic data types integer, float, string and boolean. The user may as well chose the types “Symbol”, “Any”, “Instances” or “Class”. “Symbol” is an enumerated datatype. It might be used e.g. by the attribute fuel, allowing diesel, gasoline and super unleaded as possible values. “Any” means that the type can be any of these value types. If “Instance” or “Class” is selected, another instance or class from the ontology has to be selected to be referenced to. Slots can be constrained in the classes to which they are attached, such as restricting a slot’s cardinality.

The standard facets for template slots in Protégé are: documentation, allowed values, minimum and maximum cardinality, default values, inverse slot and template-slot values.

Like in object-oriented programming languages, instances are specific entities of a given class. New instances can be created and values can be assigned to the attributes and relations. A form to enter data is generated automatically when an instance is created.

 

The functionalities described are provided by the tabs Classes, Slots, Frames, Instances, Ontology BeanGenerator and OntoViz. Additionally, the screen layout to create instances can be designed via the tab Forms and, as the name suggests, the tab Queries allows to type queries for the ontology.

 

The figures 4 and 5 provide screenshots of Protégé’s GUI, showing the class editor and the slot editor.

 

Figure 4: Edition of class Motor in Protégé

 

This figure shows the editor for the class Motor. On the main frame, all defined Template Slots are listed, such as fuel, horsepower and has_v_belt. The taxonomy of entities is represented in a tree structure on the left side. Motor has only one superclass, the class called Automobile_Part, which again is a subclass of Concept. Concept is a subclass of the class Thing, which is the root class for all ontologies in Protégé.

 

Figure 5: Edition of slot has_oil_sump in Protégé

 

As this screenshot shows, the slot has_oil_sump is defined in the domain Motor and links to an instance of class Oil_Sump. Further interaction fields such as cardinality or documentation can be used to specify and constrain the slot. Referring to the example in section 3.4.3.5, this slot in Protégé is represented as an object property in OWL.