Notes on this page are based on the “Introduction to Semantic Web” slides of the “Logics for Data and Knowledge Representation” series from Fausto Giunchiglia and Feroz Farazi at the University of Trento.


 

Definitions

  • The Semantic Web (SW) is an extension of the World Wide Web (WWW). In the Semantic Web, information is given well-defined meaning to enable computers (and people) to work in co-operation.
  • The SW is a new, alternative form of WWW content that is machine processable, allowing intelligent software agents to make use of these representations.
  • Can be thought of as an extra abstraction layer (semantic layer) that’s built on top of the existing WWW.

 

The World Wide Web

The WWW consists of an enormous collection of data and documents. This data is often of mixed formats & scopes and is continually growing and changing. This presents some well known limitations when attempting to search, extract, and maintain data sets.

The SW aims to address these limitations and provide a better experience (for users and machines) utilising integration and consistency.

For example, imagine that you’re planning a conference trip to the Greek island of Crete. You search for local hotels and find that Aldemar Hotels (a favourite chain of yours) has multiple locations on the island. You wonder which branch is nearest to your conference location. To determine the distances, you have to copy each branch’s location into a mapping service (e.g. Google Maps). This takes time as you have to copy data manually between multiple services. This process could probably be made easier by utilising consistent Semantic Data to integrate multiple services?

 

Smart Web Applications

The WWW is overwhelmed with an ever increasing number of ‘Smart’ applications.

  • Search engine matching is non-trival, and can be very intensive.
  • Commerce systems regularly use user purchase patterns to recommend new products.
  • Mapping services can determine distances, plot routes, and display detailed geographic data.

Each of these smart applications is only as smart as the data provided to them. Incorrect or inconsistent data will only lead to an incorrect/inaccurate result. These applications could be improved drastically by increasing the consistency and connectivity of the data provided to them.

 

Smarter Web Applications

By providing consistent and connected data to these applications, we can better integrate them to provide the user with a seamless, more informed experience.

In the above mapping example, tagged data about each branch’s location (as well as the location of the conference site), could be automatically fed to a smart mapping service to seamlessly provide distance information to the user.

 

Semantic Data

To that end, we can use Semantic Data (SD). SD is computer understandable data that…

  • … can represent real world entities (like hotels) and their attributes (like location) in Semantic Web languages using standard vocabularies.
  • … can link one data element to another (through URIs) to form a web of data.

We organise this Semantic Data into sets of entities:

  • Entiries are objects that are important enough to be referred to with a single name.
  • Each entity can have its own meta-data and attributes (e.g. location, age, height)
  • Can be linked to other entities via relationships.
  • Has clear separation between the knowledge (concrete entities) and the language (classes/concepts) used to express it.

 

Formal Language

The same concept can be expressed in different ways within a language (e.g. “Car” or “Automobile” in English) and across languages (e.g. “Car” in English, “Coche” in Spanish).

A DERA domain consists of three elementary components (entity, relation, and attribute) and organises the (formal concept) language into any number of sub-domains. Each entity is organised into classes representing similar objects, and build up to form a lattice of overlapping domains (with the top domain represented as an upper level ontology).

 

A small fragment of the ‘Entity’ facet of the Space Domain.

 

Each DERA Domain contains a number of facets (hierarchy of terms denoting an atomic concept), where each facet is one of three types (Entity, Relation, Attribute).

  • Entity: A thing with distinct and independent existence (including classes of entities, see above image)
  • Relation: A connection between entities
  • Attribute: A characteristic/quality of an entity
 

Knowledge

A set of entity types (eType), each eType defined in terms of:
  • Attributes (e.g. height, location)
  • Relations (e.g. locatedIn, friendOf)
  • Services (e.g. computeAge, computeFriendsOfFriends)

 

Entity type lattice.

 

Some examples of eTypes.

 

A critical issue with definining eTypes however, is the face that some entities have an inherent polysemy (multiple meanings depending on the situation). As this a perfectly valid interpretation, it would be incorrect to permenantly assign one of these individual meanings to an entity. As such, we need some systematic way to represent these polysemic entities.

 

Encoding into RDF

TBOX: A TBox T is a finite collection of concept inclusion axioms of the form C D and concept equivalence axioms in the form of CD (where C and D are concepts).
 
ABOX: An ABox A is a finite collection of axioms in the form C(a), R(a, b), where a and b are individual names, C is a concept, and R is a role/relation.
 
  •  Entity facets translate into TBOX concept axioms (e.g. “River” is a subset of “Body of Water”).
  • Relation facets translate into TBOX role axioms (e.g. the “fatherOf” relation is a subset of the “parentOf” relation).
  • Attribute facets translate into TBOX axioms (e.g. “angularDistance” is a subset of “latitude”).
  • Entity properties translate into ABOX statements (e.g. livesIn(“Cambridge”, “UK)).