Introduction to RDF


Posted in Semantic Web, Software Development on December 14, 2011

The Resource Description Framework (RDF) is the most fundamental of the semantic web technologies.  It is a way of structuring data that is expressive, precise, but also very flexible. Those familiar with relational database design can think of RDF as the ultimate data normalization scheme- representing data with a structure derived from human languages, but structured more rigidly so machines can interpret it.

Like many powerful concepts, RDF is fundamentally simple but can get quite complex in implementation. Many available articles focus on these intricacies and complexities. My intention in this article is to clearly and simply describe RDF in its most basic and abstract form.

First, consider a simple statement in English like, “This article’s author is Rob Dixon.” If you remember high-school English class, you might recall that sentences have a subject and a predicate. The subject is what the sentence is about- in my example that is “this article”. The predicate is the information about the subject, so in this case “author is Rob Dixon.”  The word “author” is called the simple predicate (in RDF, this word is actually called the “predicate”, so don’t get confused by the terminology).  The last part of the sentence, my name, is the sentence’s object- that which receives the action of a verb. In my sentence the implied verb is “authored by,” so the object is me.

So we’ve taken the basic sentence, “This article’s author is Rob Dixon,” and have broken it down into three parts:

{subject, predicate, object}

This is an RDF triple.  The example sentence as an RDF triple is:

{this article, author, Rob Dixon}

The information in the sentence can be expounded upon, and all you need to do is string together multiple triples.  So, “This article’s author is Rob Dixon, who lives in Denver” can become two triples: {this article, author, Rob Dixon} and {Rob Dixon, lives in, Denver}.

Pretty simple, right? It is, but you can imagine that once you start stringing many of these triples together they can form complex structures. That is why one of the primary ways RDF is visualized is with a graph, where subjects and objects are represented as shapes with the predicate displayed as a line connecting the two.

Another complexity is actually identifying what the subject, predicate, and object actually mean, without ambiguity.  In my example, I used the words “this article” to make it human readable, but that is in fact a poor way to identify the article.  It is much better to use the article’s URL: http://www.robdixoniii.com/introduction-to-rdf. A more useful RDF triple is then:

{http://www.robdixoniii.com/introduction-to-rdf, author, Rob Dixon}

But when you think about it, you’ll realize that the other components of the triple have the same problem. What exactly do I mean by author? Which Rob Dixon? After all, there are many in this world. For this reason, every component of the triple can be a resource specified by a URI, like the URL of this article. In fact, the predicate must be a URI according the specification.  This is important, because we all know that English words have different meanings when used in different contexts. The set of resources used by some RDF graph is called a vocabulary.

Since it would be a hassle to always have to define your vocabulary, there are predefined vocabularies published by a variety of sources. One popular vocabulary is the Dublin Core. It defines many common concepts, like authorship, so you don’t have to. Technically, you don’t really have to define your vocabulary, just make it a URI.  So I could say my predicate “author” is “http://www.robdixoniii.com#author“. The fact that that URI doesn’t resolve to a resource doesn’t impact the validity of the RDF.

When completely converted into resources, the sentence “This article’s author is Rob Dixon,” turns into the RDF triple:

{http://www.robdixoniii.com/introduction-to-rdf, http://www.robdixoniii.com#author, http://www.robdixoniii.com}

Now we are back to a technical looking construct that doesn’t look so “semantic” anymore. It is though, just in a way a computer can interpret. There are ways to express RDF in more human readable formats and there are shorthand notations so you don’t see the full URI of each item in a triple, but I don’t want to go into written notations of RDF here- just know there are many.

At its core, there are four simple rules that define RDF:

  1. An RDF triple is a subject, predicate, and object, in that order.
  2. An RDF triple represents a complete and unique fact (in the formal logic sense, but practically the reason why URIs are part of the specification).
  3. An RDF triple can be combined with other RDF triples, however that operation doesn’t change the meaning of any individual triple.
  4. The subject component of the triple must be either a blank node or a URI. The predicate must be a URI, and the object can be a blank node, a URI, or a literal value.

Tagged:

Comments

  1. No comments yet.

Submit Your Comment