The Layers of the Semantic Web
A Layered Architecture
The development of the Semantic Web proceeds in layers. One above another. Allowing for a more standardised way of developing. As it is being built on existing technology it allows developers to roll out parts of technology and implementing them. Without realising the full capabilities of the Semantic Web. Each layer that is added to the Semantic Architecture should follow two principles:
• Downward Compatibility – Should be able to interpret information at lower layers
• Upward partial understanding – Should have the ability to partially understand new layers
Adding layers in this manner will give the technology interoperability. To achieve compatibility the Semantic Web uses vocabularies, taxonomies and ontologies. Which contain a set of defined terms and each is critical to the ability to express the meaning of data. A combination of a schema language and an ontology language provide these capabilities. The following sections discuss these layers in detail, describing how the combination of each layer provides architecture for the Semantic Web.
Uniform Resource Identifiers (URIs)
The development of the Semantic Web is heavily influenced by the fact that anyone can name or describe anything. To be able to describe things there needs to be a way to reference or identify them, both the current web and the Semantic Web use URIs for this task:
http://www.co-ode.org/ontologies/pest.owl#Animal
The purpose of an URI is to unambiguously specify an identifier to represent a resource in a uniform way, identifying information representation constructs’, including classes, properties and individuals. As there is no ambiguity, it becomes possible to aggregate all data that refers to a given resource. It is the use of URIs that gives the Semantic Web a fundamental benefit over other technologies. Using URIs provides users and software to know exactly what it is they are being referred to, they are globally unique and each occurrence of the same identifier means the same thing. By having resources labelled in this way, makes it easier to integrate data sources that have been created independently.
XML (Extensible Mark-up Language)
The HTML program is not extensible. That is, it has specifically designed tags that require universal agreement before changes can be made. Web site developers had no way of adding their own tags, the solution was XML. It offered developers a way to identify and manipulate their own structured data. The basic building block of XML is the element and can have a number of associated attributes; the elements can be nested within each other to form a tree like structure. XML is a data model and uses a schema language (XML schema), to constrain the format, not the meaning of the data. The schema will express shared vocabularies, define structure, content and semantics and will allow machines to carry out rules made by developers. Using metadata to describe what the data type is and the format it is in.
The term metadata means data about data.
The concept is to provide structured information that describes, locates and explains information resources making it easier for resources to be retrieved. The concept of metadata is not new – a library catalogue contains metadata about books held in the library. It is important to remember that data and metadata are different. Data is values, individual parts of information, whereas metadata describes the relationship between the parts and other data. Metadata does not change as frequently as data, it is commonly used in relational databases representing information about fields, for example ‘age’ is an integer or ‘name’ is a string. Together data and metadata make information portable, because the relationships among the data values remain separate from their storage.
Metadata is a key concept in developing the Semantic Web. To allow computers to share information automatically. Data and metadata must be grouped together.
For metadata to work consistently, it is necessary for metadata to be described in a standard way. One of the most common standards is the Dublin Core Metadata Inititiative (DCMI) which provides an agreed set of core metadata elements for use in resource discovery, some of which are:
• Title
• Subject
• Description
• Contributor
• Date
• Type
• Identifier
• Source
• Language
The use of these elements will become more apparent within this research, as Dublin Core contributed towards the direction of the Resource Description Framework. This is a metadata model and is used in the development of the Semantic Web; each model must be interoperable.
Resource Description Framework (RDF)
The current web is made up of documents that are linked to one another and it is down to the user to interpret the meaning of what it describes. In the world of the Semantic Web everything is described as a resource, which is part of the name of the technology that forms the base of the Semantic Web; Resource Description Framework (RDF). RDF is an approved W3C standard and as outlined on their website:
“RDF is a standard model for data interchange on the Web. It has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.”
XML and RDF form the basic relational language layer of the Semantic Web architecture. RDF is away for describing resources; a resource is anything that can be identified with a URI. The purpose of RDF is to provide a standard framework for making statements about resources and their attributes, making assertions about resources is the basis of information representation. There are several features to RDF:
• Statements are generic and can describe any domain
• RDF can be distributed (like HTML), allowing for growth of a knowledge base
• Statements can be exchanged by heterogeneous applications and interpreted without loss of meaning
• RDF enables the use of inference allowing queries to be answered
The information is presented and stored as triples that follow the pattern subject-predicate-object or s-p-o, much like the subject, verb and object of a sentence.
RDF data is often displayed as a graph, triple, subject and objects are shown as nodes and predicates are shown as directed arcs between nodes.
This type of graph is also known as a semantic net in the Artificial Intelligence community and as outlined in earlier sections graph theory is commonly used for modelling the Semantic Web.
All parts of the triple are resources with exception of the last part, object, which can also be literal. Literals are text strings with optional language identifiers and optional data type identifiers. A literal with a data type identifier is a typed literal. A literal without a data type is called a plain literal. Literals are one type of property value used in RDF. The data type is what is used to describe the literal, like a string, integer or if it is in hexadecimal or decimal format. A property (or predicate) is a specific type of resource; they can describe relations between resources, for example “age”, “author” and “type”, properties in RDF are also identified by URIs.
RDF provides away to model information. It does not provide a way of specifying semantics, or what the information means. To express a common meaning, a vocabulary will need to be developed; the Resource Description Framework Schema provides the platform for such a vocabulary.
Resource Description Framework Schema (RDFS)
RDFs is a schema language that provides basic structure by using classes and properties, the structures are formally defined and builds on the RDF foundation. The schema provides additional descriptive features and a language for describing the expanded vocabulary. It is an universal language that lets developers describe resources using their own vocabulary. By using RDFs the classes and properties can be arranged in generalisation/specialisation hierarchies.
A generalisation/specialisation hierarchy serves at least four major purposes.
• It provides a form of knowledge representation.
• The names of the intermediate levels in the hierarchy provide a vocabulary that can be used among developers
• The hierarchy can be extended by adding new specialisations at any level.
• New attributes and behaviour can be added easily to the proper subset of specialisations.
RDFs allows for the definition of the domain and range expectations for properties, assert class membership, specify and interpret data types. It provides semantics to resources within RDF. The language provides features for encoding domain-specific vocabularies by adding constructs to RDF, while using the RDF/XML syntax. Although RDFs is reasonably powerful for defining semantics it also has its drawbacks:
“RDFs is not very expressive compared with many other ontology languages, as it allows only representation of concepts, concept taxonomies, binary relations and simple domain and range restriction on properties”
There are several features missing from RDF and RDFs that would make it more effective for modelling ontologies:
• Logical scope of properties – RDFs cannot specify range that only applies to some classes
• Disjointness of classes – cannot be specified in RDFs (male and female)
• Boolean combinations of classes – union, intersection and complement are not available
• Cardinality restrictions – For example a person has exactly one gender, this cannot be done in RDFs
• Special Characteristics of properties – transitive, unique and inverse properties like ‘bigger than’ or ‘is father of’ cannot be modelled.
The weakness in the expressive power of RDFs is what led to the development of more expressive languages for the Semantic Web. However, one point needs to be made is that the more expressive the language the less reasoning support.
Web Ontology Language (OWL)
With the influence of reasoning systems, Description Logics and web languages the Web Ontology Language (OWL) was developed and can be used for defining ontologies. OWL has been built upon RDF and RDFS and has the same XML based syntax. It satisfies the Semantic Web’s requirements of providing minimal input from humans and supporting software requirements for a language with explicit meaning. Adding additional vocabulary to ontologies, extending RDFs with ontological constructs for describing object-oriented classes, properties and individuals. The ontology language uses RDF and RDFs, XML Schema data types and OWL namespaces. OWL became a World Wide Web Consortium (W3C) recommendation in 2004 and is being developed even further. The full language is called OWL Full, with two sub-languages OWL-DL and OWL Lite which are restricted versions. The reason for this is they sacrifice expressiveness for performance and simplicity.
Each ontology document consists of an optional header, annotations, classes and property definitions (axioms). Facts about individuals and data type definitions. The classes, individuals and properties form the main building blocks of an ontology:
• A class is a set of resources
• An individual is any resource that is member of at least one class
• A property is used to describe a resource
An OWL class is resource that represents a set of resources that are similar and the individual is an instance or member of that class, for example vehicle is a class and car is the member. The advantage to OWL is that it has more constructs available to use, than the lower levels and depending on the ontology, none, some and all can be used. OWL is able to express cardinality and disjoint classes, again two important aspects involved in object-oriented design. Disjoint classes specify that no two statements can exists where the subject and objects of each statement are the same.
An example of this would be gender male and female. These are disjoint as they cannot both exists on the same subject and objects.
Cardinality can put restriction on how many times a property can be used to describe an instance of a class. Take a class ‘person’, the restriction could be stated that a person must have one name and one birth place, using the min and max cardinality, this restriction can be specified in OWL. Using these modelling techniques allows developers to express data in ontologies with greater detail than with other languages.
The whole point to the Semantic Web is to allow models to be extended, which means that at anytime an ontology could be added to. Ontologies are monotonic, in that they preserve the order of statements, so something can be added but cannot be taken away.
This supports the open world assumption, that anyone can say anything about any topic (AAA). The open world assumption has played a big part in the growth of the web and needs to be taken into consideration when modelling for the Semantic Web. The risk is that if the data is being modelled by anyone then is it accurate? The answer is yes until it is proven false, however when modelling ontologies, the idea is to constrain the data by using rules or Axioms. So for example:
Mouse -> Animal and(hasLimbs only legs)
This constraint prevents any one adding additional limbs to the mouse, like arms or wings. By introducing constraints the risk of bad modelling can be reduced, providing more accurate ontologies.
This post has discussed how languages are building on top of one another to form a layered semantic architecture.
To summarise:
• XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.
• XML Schema is a language for restricting the structure of XML documents and also extends XML with data types.
• RDF is a data model for objects (“resources”) and relations between them, provides a simple semantics for this data model, and uses XML syntax.
• RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalisation-hierarchies of the properties and classes.
• OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. “exactly one”)
It is apparent that the future of the semantic web and applications will heavily depend on how the semantic architecture is built and used. Each new structure must take into consideration new and previous layers.