Ontologies are a
popular research topic in Artificial Intelligence (AI), e.g. in knowledge
engineering, natural language processing, intelligent information integration
and multi-agent systems. But ontologies are also applied outside AI in the World
Wide Web community. OWL (see chapter 3.4.3) e.g. originates from efforts of the W3C working group that developed the idea of the Semantic
Web. By attaching
information to data that describe its contents and meaning, web resources can
be used by machines. The data can not just be used for display purposes, but as
well for automation, integration and reuse of data across various applications.
The notion of Ontology has a long history in philosophy. There, it is a branch of metaphysics and refers to the study of being or existence. It has strong implications for the conceptions of reality.
In the area of computer science, many definitions exist and are discussed. The most widespread and widely accepted one in the context of AI is the following from [Gru93]:
An ontology is a formal, explicit
specification of a shared conceptualisation.
Conceptualisation refers to an abstract model of the world that should be represented. It defines entities and relationships between them in a specific domain of discourse. Both names of the entities and relationships are provided with human-readable text, describing their meaning. In other words, machine-processable content is linked with meaning for humans.
The conceptualisation is shared, which means that the knowledge represented by an ontology is not restricted to some individual but should be accepted by a group.
Formal refers to the fact that the ontology should be machine-readable, and explicit means that the concepts and relations between them have to be explicitly defined. Although data is machine-readable, it is represented in the form of semi-structured natural language text.
At least as important as what an ontology is, is the question what it is good for. As a shared conceptualisation, the main purpose for setting up an ontology is to enable knowledge sharing and knowledge reuse. Ontologies capture general knowledge about a domain that is changing rarely and specify concepts and relations about which knowledge is to be accumulated and processed. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualisation, explicitly or implicitly.
For information systems or for the Internet, ontologies can be used to organise keywords and database concepts by capturing the semantic relationships among the keywords or among the tables and fields in a database. E.g. Yahoo! has a hierarchical structure categorising Internet documents by using formal metadata. The semantic relationships give the user an abstract view of available information for its domain of interest [HS97].
A common way to classify ontologies is depending on their level of generality. [Fen04] identifies these types:
· Domain ontologies contain knowledge that is valid for a particular type of domain (e.g. electronic, medical, mechanic).
· Meta data ontologies provide a vocabulary for describing the content of information sources in the World Wide Web. The Dublin Core Metadata Initiative e.g. develops online metadata standards supporting a broad range of purposes and business models.
· Generic or top-level ontologies (also called common sense ontologies) capture general knowledge. The conceptualisations defined here are valid across several domains.
· Representational ontologies provide representational entities without stating what should be represented. As a consequence, they do not commit themselves to any particular domain. A well-know representational ontology is the Frame-Ontology [Gru93]. It defines concepts such as frames, slots and slot constraints, allowing the expression of knowledge in an object-oriented or frame-based way. The ontology editor Protégé (see chapter 3.5) follows this model.
· Method and task ontologies can provide terms for particular tasks or problem-solving methods. They are used for reasoning on domain knowledge.
If two agents communicate about some domain, it is necessary for them to agree on the terminology that they use to describe this domain. A set of definitions of formal vocabulary, i.e. a conceptualisation, can be used to set up this consensual terminology and therefore supports in knowledge sharing among AI software, such as software agents.
Common ontologies
are used to describe ontological commitments for a set of agents so that they
can communicate about a domain of discourse without necessarily operating on a
globally shared theory. An ontological commitment is an agreement to use a
vocabulary. [Gru93] says that software agents commit to ontologies. Thus, their
observable actions are consistent with the definitions in the ontology and they
gain a shared understanding of the domain of discourse. As already mentioned, ontologies make
messaging meaningful. But ontologies serve as well as prerequisites for
consensus since agents can only communicate when they have already agreed on a
consensual point of view on the world. Further, ontologies allow agents to ground their beliefs and
actions.
Ontologies are formal theories about a certain domain of discourse and therefore require a formal logical language to express them. Several languages that describe ontologies exist, such as CycL and KIF, Ontolingua or Frame Logic [Fen04]. Languages for defining ontologies are syntactically and semantically rich languages, e.g. richer than common approaches for databases.
In this work,
domain ontologies are expressed in Web Ontology Language (OWL). The
main reason to choose this format was because OWL is a W3C
Recommendation since February 2004. A W3C
Recommendation is understood by the industry and the community of researchers
as a web standard. It is a stable specification developed by a W3C working
group and reviewed by the W3C Membership. Another reason for OWL is simply a
more practical reason: The ontology editor Protégé (see chapter 3.5) enables
the use of a plug-in to store the ontology in OWL and the Jena Framework (see
chapter 5.5) provides a parser for OWL. But, the existence of the OWL plug-in
and the Jena Framework, might in turn result from OWL’s popularity as a
powerful Web standard.
Since OWL is enhanced Resource Description Framework (RDF), the following chapter 3.4.1 introduces this language. A description of its extension RDF Schema is given in 3.4.2, before focussing on OWL in 3.4.3.
RDF is a standard
form for describing machine-processable semantics of data. The syntax is
defined in XML. It represents an attempt to attach additional semantics to XML
by defining relationships.
Every document is represented in a tree structure.
The different leaves of the tree have well-defined tags as well as contexts
with which the information can be understood. These contexts represent
semantics.
The underlying structure of any expression in RDF
is a collection of triples. Each triple consists of a subject, a predicate and
an object [KC04], as figure 3 shows. A set of such triples is
called RDF graph.
Figure 3: RDF data model |
A triple
represents a statement of a relationship between the things denoted by the
nodes (i.e. subjects and object) that it links. The
direction of the arc is significant: it always points toward the object.
·
A subject
(or resource) is an entity that can be referred to by a World Wide Web address,
i.e. by an URL or URI. Resources are the elements that are described by RDF
statements.
·
A predicate
(also called property) defines a binary relation between resources and / or
atomic values provided by primitive data type definitions in XML.
·
An object
is a value for a subjects predicate. Therefore, it provides the actual
characterisation of Web documents.
A simple example is:
capital (http://www.countries.com/Portugal) = Lisbon
Expressed in natural language, this states that “Lisbon
is the capital of Portugal”. The sentence has the following parts:
Subject (Resource) |
http://www.countries.com/Portugal |
Predicate (Property) |
capital |
Object (Literal) |
Lisbon |
The RDF/XML document representing this information
looks as follows:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3c.org/1999/02/22-rdf-syntax-ns#"
xmlns:ex="http://example.org/schema/">
<rdf:description
rdf:ID="Portugal">
<rdf:type
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
<ex:capital>Lisbon</ex:capital
>
</rdf:description >
</rdf:RDF>
As decribed in [Bec04], any RDF graph - and any
OWL ontology graph as well - can be written in many different syntactic forms.
One example can be found in chapter 3.4.3.5 where owl:functionalProperty
is described.
RDFS stands for
RDF Schema and is a vocabulary for describing properties and classes of RDF resources.
By providing a framework, it allows the definition of classes and properties
through their types. But it does not provide the
actual application-specific classes and properties.
The following example refers to
the former RDF example in 3.4.1. But here, rdfs:Class is used instead of rdf:description and the rdf:type information is dropped. Additionally, the rdfs:class country is added to demonstrate the use of rdfs:subClassOf.
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3c.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:ex="http://example.org/schema/">
<rdfs:Class rdf:ID="country" />
<rdfs:Class rdf:ID="Portugal">
<ex:capital>Lisbon</ex:capital>
<rdfs:subClassOf rdf:resource="#country" />
</rdfs:Class>
</rdf:RDF>
As this short example shows, classes in RDFS remind of classes in object-oriented programming languages. Resources can be defined as instances of classes and as subclasses of other classes. rdfs:subClassOf, which defines a relationship between two elements of class, is one of the core property types of RDFS. It is assumed to be transitive.
Recapitulating, the RDF infrastructure enables encoding, exchange and reuse of structured data. OWL is enhanced RDF. Its model is inferred from the RDF model but is more flexible. More vocabulary allows the description of properties and classes, and more complex kinds of relationships among the elements can be expressed. OWL was designed to extend the semantic reach of current XML and RDF metadata efforts.
OWL is a semantic
markup language for defining, publishing and sharing ontologies in the World
Wide Web. As part of the Semantic Web attempt, it was developed as a vocabulary
extension of RDF. Thus, it is written in XML and therefore
has all advantages XML provides: information can easily be exchanged between
different types of computers using different types of operating system and
application languages. OWL differs from RDF since it enables greater machine interpretability of
Web content by providing additional vocabulary (i.e. syntax) along with a
formal semantics.
OWL has two important contributions:
·
Its standardized
syntax for writing ontologies specifies classes, subclasses, properties and
subproperties. Predefined properties are available to be used to model
instance of and subclass of relationships as well as domain restrictions and
range restrictions of attributes.
·
A standard
set of modeling primitives like “instance of” and “subclass of”
relationships. Although the relationships between entities might be more
complex than in RDF, an OWL ontology uses the RDF graph data model.
An OWL ontology mainly contains classes,
properties, instances of classes and relationships between these instances.
Additionally, it defines a namespace declaration and an ontology header. Therefore,
it can be used directly to publish and share sets of terms, i.e. to describe an
ontology. Support includes advanced Web search, software agents and knowledge
management.
The following subchapters show how OWL documents are structured and selectively pick out language constructs that are used in the ontologies of this work. A complete set of OWL language constructs can be found in [DS04].
Since ontologies
are distinct resources, they must have identifiers making it possible to
uniquely identify concepts. XML namespaces provide a method to avoid
element name conflicts, using URI references.
In RDF, an URI identifies a namespace which belongs to a schema. The declarations are included in the rdf:RDF tag. The built-in vocabulary for OWL all comes from the OWL namespace http://www.w3.org/2002/07/owl#. An example is:
<rdf:RDF
xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns="http://www.owl-ontologies.com/unnamed.owl#"
</rdf:RDF>
Besides the namespaces for RDF, RDFS and OWL, this example declares a namespace for Protégé and another one (http://www.owl-ontologies.com/unnamed.owl#) that is used for all user-defined entities. It can be edited in Protégé during the ontology definition process.
The header of the document contains information about the ontology it describes. An example:
<owl:Ontology rdf:about="">
<owl:imports
rdf:resource="http://protege.stanford.edu/plugins/owl/protege"/>
<rdfs:comment>Example of Ontology
Header</rdfs:comment>
</owl:Ontology>
The first tag <owl:Ontology
rdf:about=""> states that this block describes the
current ontology.
<owl:imports> references another OWL ontology. An URI
specifies from where the ontology is imported. The meanings of the definitions
of the referenced ontology are considered to be part of the importing ontology.
The imported ontology in this example contains a class definition for <owl:Class
rdf:ID="PAL-Constraint"/>. Thus, it can be used in the
importing ontology, e.g. by instantiating this class via <protege:PAL-Constraint
rdf:ID="CarAssembling_00141"/>. This concrete import is required due to the
OWL plug-in used within Protégé.
<owl:imports>
statements are transitive. I.e., if
ontology A imports B, and B imports C, then A imports both B and C.
Importing an ontology into itself is regarded as a
null action. If ontology A imports B and B imports A, they are considered to be
equivalent.
By defining classes, resources with similar characteristics are combined. Every class is associated with a set of individuals (see 3.4.3.4), called the class extension. The individuals in the class extension are also called instances of the class.
The simplest way describing a class is through a class
name, like this short example shows:
<owl:Class rdf:ID="Automobile_Part">
This will assert the triple "
ex:
Automobile_Part rdf:type owl:Class
", where
ex
is the namespace of the relevant ontology.
The effective use of ontologies depends on the
abilty to reason about individuals. Hence, a mechanism is necessary to describe
the classes the individuals belong to and the properties they inherit. Although
it is possible to assert specific properties about individuals, most of the
power of ontologies is a result of class-based reasoning. Therefore, OWL
provides class axioms that state additional characteristics of a class.
To express taxonomy, the basic construct is rdfs:subClassOf.
It states that the class extension of one class
description is a subset of the class extension of another class description. An
example:
<owl:Class rdf:ID="Light">
<rdfs:subClassOf rdf:resource="#Automobile_Part"/>
<protege:abstract>true</protege:abstract>
<rdfs:comment
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
any device serving as a source of car illumination
</rdfs:comment>
</owl:Class>
This class axiom declares a subclass relation
between the two OWL classes Light and Automobile_Part. Subclass relations provide necessary
conditions for belonging to a class. In this case, an individual belonging to Light
also needs to be an Automobile_Part.
Additionaly to user-defined taxonomy, owl:Class
is implicitly a subclass of the predefined class owl:Thing.
This means that the class extension of owl:Thing is the set of all
individuals.
The owl:unionOf
property links a class to a list of class descriptions. The statement describes
an anonymous class whose class extension contains those individuals that occur
in at least one of the class extensions of the class descriptions in the list.
An example:
<owl:FunctionalProperty rdf:ID="holder_nr">
<rdfs:domain>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Stoplight"/>
<owl:Class rdf:about="#Low_Beam_Light"/>
<owl:Class rdf:about="#Long_Distance_Light"/>
<owl:Class rdf:about="#Parking_Light"/>
<owl:Class rdf:about="#Head_Light"/>
</owl:unionOf>
</owl:Class>
</rdfs:domain>
<rdfs:range
rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>
</owl:FunctionalProperty>
The owl:FunctionalProperty (see also 3.4.3.5) with rdf:ID="holder_nr" belongs to each class
that is listed in the set owl:unionOf.
Individuals describe members of classes. They are defined with
individual axioms, which are also called facts. Facts are statements indicating
class membership and property values of individuals. An example:
<Wheel
rdf:ID="carAssembling2_00213">
<price_value rdf:datatype="http://www.w3.org/2001/XMLSchema#float">
89.9
</price_value>
<price_currency
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
EUR
</price_currency>
<manufacturer
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Michelin
</manufacturer>
</Wheel>
The individual named carAssembling2_00213 is an instance of the class Wheel. In this example, the inexpressively name is generated by Protégé. The facts are price_value, price_currency and manufacturer. Each one is defined with an individual value.
Properties are binary relations that allow the
assertion of general facts about the members of classes and specific facts
about individuals. OWL distinguishes between two main categories: datatype
properties and object properties.
Datatype properties
describe relations between instances of classes and RDF literals and XML Schema
datatypes. In the following example, horsepower describes the
relation between the class Motor and the XML datatype int.
The axiom rdfs:range asserts that the values of this property
must belong to data values in the specified data range, while rdfs:domain
determines the class. Additional, the property can be commented using the
appropriate RDFS tag.
<owl:DatatypeProperty
rdf:ID="horsepower">
<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
SAE@rpm
</rdfs:comment>
<rdfs:range
rdf:resource="http://www.w3.org/2001/XMLSchema#int"/>
<rdfs:domain rdf:resource="#Motor"/>
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>
</owl:DatatypeProperty>
The second
category are object properties. They are relations
between instances of two classes, i.e. they link individuals to individuals. An
example is:
<owl:ObjectProperty
rdf:ID="has_oil_sump">
<rdfs:range rdf:resource="#Oil_Sump"/>
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#FunctionalProperty"/>
<rdfs:domain rdf:resource="#Motor"/>
</owl:ObjectProperty>
The first line
declares the property. rdfs:range and rdfs:domain are built-in properties. In
the case of an object property, the former syntactically links a property to a
class description, in this case Oil_Sump. The axiom asserts that the values of
this property must belong to the class extension of the class description. rdfs:domain syntactically links a property to a class
description. Expressed in natural language, we would say “a motor has an oil
sump”. An rdfs:domain
axiom asserts that the subjects of such
property statements must belong to the class extension of the indicated class
description.
Another syntactic variation, semantically
equivalent to the example above, is the following one. It uses the tag owl:FunctionalProperty
instead of including this information in the rdf:type tag. The built-in class owl:FunctionalProperty is a special subclass of the RDF class rdf:Property.
<owl:ObjectProperty
rdf:ID="has_oil_sump">
<rdfs:range rdf:resource="#Oil_Sump"/>
<rdfs:domain rdf:resource="#Motor"/>
</owl:ObjectProperty>
<owl:FunctionalProperty
rdf:about="has_oil_sump"/>
A functional property is a property that can have
only one value y for each instance x. I.e. there cannot be two
distinct values y1 and y2 such that the pairs (x, y1) and
(x, y2) are both instances of this property. Carried forward to this
example, which states that the has_oil_sump property is functional, it means that
“one oil sump can only belong to one motor”.
Both object properties and datatype properties can
be declared as functional.
OWL allows three
types of data range specifications. First of all, OWL uses the RDF
datatyping scheme, which in turn is derived from the XML datatyping scheme.
Secondly, it allows the use the RDFS class rdfs:Literal, which is the class of literal values
such as strings and integers, and thirdly, the enumerated datatype can be used.
The enumerated
datatype uses the owl:oneOf
construct, which defines a range of data values. It
is not only used in connection with properties like in the following example
but as well for describing enumerated classes.
<owl:FunctionalProperty
rdf:ID="fuel">
<rdfs:range>
<owl:DataRange>
<owl:oneOf
rdf:parseType="Resource">
<rdf:first
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
diesel
</rdf:first>
<rdf:rest
rdf:parseType="Resource">
<rdf:first
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
gasoline_unleaded
</rdf:first>
<rdf:rest
rdf:parseType="Resource">
<rdf:first
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
super_unleaded
</rdf:first>
<rdf:rest
rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
</rdf:rest>
</rdf:rest>
</owl:oneOf>
</owl:DataRange>
</rdfs:range>
<rdfs:domain
rdf:resource="#Engine"/>
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>
</owl:FunctionalProperty>
In the case of an enumerated datatype, the subject
of owl:oneOf
is a blank node of class owl:DataRange
and the object is a list of literals. RDF requires this collection to be a list
of RDF node elements. In other words, the list of data values is a nested
construction and has to be filled with the basic list constructs rdf:first
, rdf:rest
and rdf:nil
, as the example clarifies.
All ontologies in this thesis have been developed with the aid of Protégé, latest version 2.1.2.
The Protégé tool is an open-source, Java-based knowledge-modelling platform. It was developed by the Stanford Medical Informatics (SMI) group at Stanford University and is freely available. The core of the application is the ontology editor with its graphical user interface (GUI), which allows the user to construct a domain ontology, to customise the automatically generated data entry forms and to enter data. External standalone Java applications can be developed using Protégé’s API in order to build and access domain models. One of Protégé’s major advantages is its extendibility: the architecture allows the development and integration of plug-ins.
Plug-ins are additional modules that extend the Protégé system’s core. They usually perform functions not provided by the standard distribution. Most existing plug-ins developed by both Stanford University directly and external users are available in the Protégé Contributions Library. In this work, the following modules are used:
· The OWL plug-in is a backend plug-in (also called storage plug-in). Backend plug-ins enable the user to export and import ontologies in different formats. As the name suggests, the OWL plug-in allows the user to load and store an ontology in OWL format. But it allows as well to edit and visualize OWL classes and their properties, define logical class characteristics as OWL expressions, execute reasoners and to edit OWL individuals for Semantic Web markup.
· The BeanGenerator plug-in is a so-called tab plug-in since it is embedded in the editor’s GUI. It maps objects in the Protégé model to the corresponding Java classes. These automatically generated Java classes fulfil with the JADE specifications in [Cai02]. Like in this work, intelligent software agents can profit from this mechanism since the resulting Java source files can be accessed easily from any Java program.
· OntoViz is another Tab plug-in, providing a convenient graphical visualisation of ontology models. It was used during the development process. Figures such as 11 and 12 show examples of these automatically-generated visualisations.
The Protégé knowledge model is based on frames and first order logic. It is Open Knowledge Base Connectivity (OKBC) compliant. OKBC is an API that represents a standard to access knowledge bases and therefore allows platform and language-independent knowledge-level communication. It has been developed to address the problem of knowledge base tools reusability [CFFKR98]. OKBC determines the main components that make up an ontology: classes, slots, facets and instances.
Classes are named concepts of a domain. They define attributes and relations and are organised in a class hierarchy. The class taxonomy is represented in a tree structure. Since multiple inheritance is permitted, one class may have several super-classes. Classes can be concrete or abstract. In contrast to abstract classes, concrete classes may have direct instances.
Slots make up the attributes of classes and their relationships to other classes. They are global to the ontology, i.e. the name is unique in the ontology. Each slot has a name and a value type. Protégé supports the basic data types integer, float, string and boolean. The user may as well chose the types “Symbol”, “Any”, “Instances” or “Class”. “Symbol” is an enumerated datatype. It might be used e.g. by the attribute fuel, allowing diesel, gasoline and super unleaded as possible values. “Any” means that the type can be any of these value types. If “Instance” or “Class” is selected, another instance or class from the ontology has to be selected to be referenced to. Slots can be constrained in the classes to which they are attached, such as restricting a slot’s cardinality.
The standard facets for template slots in Protégé are: documentation, allowed values, minimum and maximum cardinality, default values, inverse slot and template-slot values.
Like in object-oriented programming languages, instances are specific entities of a given class. New instances can be created and values can be assigned to the attributes and relations. A form to enter data is generated automatically when an instance is created.
The functionalities described are provided by the tabs Classes, Slots, Frames, Instances, Ontology BeanGenerator and OntoViz. Additionally, the screen layout to create instances can be designed via the tab Forms and, as the name suggests, the tab Queries allows to type queries for the ontology.
The figures 4 and 5 provide screenshots of Protégé’s GUI, showing the class editor and the slot editor.
Figure 4: Edition of class Motor in Protégé |
This figure shows
the editor for the class Motor. On the main frame, all defined Template
Slots are listed, such as fuel, horsepower
and has_v_belt. The taxonomy of entities is represented in a tree
structure on the left side. Motor has only one superclass, the class
called Automobile_Part, which again is a subclass of Concept. Concept
is a subclass of the class Thing, which is the root class for all
ontologies in Protégé.
Figure 5: Edition of slot has_oil_sump
in Protégé |
As this screenshot shows, the slot has_oil_sump is
defined in the domain Motor and links to an instance of class Oil_Sump.
Further interaction fields such as cardinality or documentation can be used to
specify and constrain the slot. Referring to the example in section 3.4.3.5, this slot in
Protégé is represented as an object property in OWL.