Introduction to the RDF Model
This section will describe the Resource Description Framework model.
RDF (Resource Description Framework) is a model for storing graphs of information. Given a set of resources, each resource being a thing, such as a person, a song, a web page or a bookmark, RDF is used to create relationships between these resources. Some people think of RDF as an XML language for describing data. However, this XML format is just a method of storing RDF in a file. If you are trying to learn RDF, it may be confusing to learn it via the XML syntax; instead, below, the RDF model is described in enough detail to understand it without discussing the XML syntax.
Think of a web or graph of interconnected nodes. The nodes are connected via various relationships. For example, let's say each node represents a person. Each person might be related to another person because they are siblings, parents, spouses, employees or enemies. Each interconnection is labeled with the relationship name.
Another type of relationship is the physical properties of a node. For instance, the name or age of a person. These relationships would be labeled with 'name' and 'age' instead of 'sibling' or 'parent'.
RDF is used to describe these relationships. It doesn't actually include the nodes directly, but it does indirectly since the relationships point to the nodes. At any time, we could introduce a new node, such as a newborn child and all we need to do is add two parent relationships, one for each parent.
In RDF, the nodes can be of two very general types, resources and literals. A literal is an actual value, such as the name 'Sandra' or the number '7'. You could think of a literal as a string when used in a programming language. A resource is a representative of something, more like an object in a programming language. For example, a person would be a resource, but the name of the person would be a literal.
In RDF, resources are given URIs so that we can identify them. Since URI's are unique, we can identify a specific resource. The value of the URI doesn't really matter to RDF since it's just used as an identifer. You might want to use some convention though, such as using a URI that includes a serial number when identifying a physical object.
We can add relationships between two resources or between a resource and a literal. These relationships are often called triples or arcs. Here are some examples:
<http://www.xulplanet.com/rdf/people/Sandra> -> name -> Sandra <http://www.xulplanet.com/rdf/people/Sandra> -> gender -> female <http://www.xulplanet.com/rdf/people/Sandra> -> sibling -> <http://www.xulplanet.com/rdf/people/Kevin> <http://www.xulplanet.com/rdf/people/Kevin> -> gender -> male
In this documentation, we'll use the convention that the items written inside angle brackets are resources, and those that are not are literals. Above, we define four triples. The first indicates that there is a resource <http://www.xulplanet.com/rdf/people/Sandra> with a name of 'Sandra'. The second indicates that the same resource has a gender of 'female'. The third triple indicates that the sibling is another resource <http://www.xulplanet.com/rdf/people/Kevin>. The final triple specifies a gender for the resource <http://www.xulplanet.com/rdf/people/Kevin>.
A name was not supplied for the <http://www.xulplanet.com/rdf/people/Kevin> resource. To the human reader, we could probably assume that the name is likely the literal 'Kevin', but the computer system will have no way of knowing this. We could use any URI form for the resource; for example the following would be just as acceptable:
<urn:x-person:S1> -> name -> Sandra
The resource <urn:x-person:S1> doesn't mean anything to a human, and would be just as unmeaningful to a computer system. In fact, a less human-meaningful URI might be better since the scheme used with <http://www.xulplanet.com/rdf/people/Sandra> wouldn't work if someone else was named Sandra. Note that the fact that an http URI is used doesn't indicate that it is associated with a web site accessible over HTTP, although you might put something related at that URL to download. Instead, you might just use the name of your web site when creating URIs to ensure uniqueness since other people shouldn't be using it.
The URI is just an identifier. RDF doesn't care what it is. But if the same URI is used is several places, they all refer to the same resource. In the earlier example, <http://www.xulplanet.com/rdf/people/Sandra> was refered to several times, but they all represent a single thing.
Even though we didn't specify a name for the <http://www.xulplanet.com/rdf/people/Kevin> resource, this doesn't mean that he doesn't have one. If a triple is not specified in the graph, it shouldn't be taken to mean that the value doesn't exist, instead it might mean that the value is not known, or that it will be provided later.
One thing to note in the example is that we didn't specify whether Kevin is a brother or sister of Sandra, we just used a more generic 'sibling' term. However, the system could figure this out using only the information provided and a bit of logic. For instance, if the system was told that a brother is a sibling that is male, it could determine that Kevin was Sandra's brother using only two of the triples:
<http://www.xulplanet.com/rdf/people/Sandra> -> sibling -> <http://www.xulplanet.com/rdf/people/Kevin> <http://www.xulplanet.com/rdf/people/Kevin> -> gender -> male
However, with just those two rules, the system can't determine whether Sandra is a brother or sister of Kevin, since those two rules don't indicate Sandra's gender. For this, we would need the appropriate rule from the earlier example.
RDF lets us make up resources, literals, and even the labels of the arcs (such as 'sibling'). Thus, we could make up a new label at any time:
<http://www.xulplanet.com/rdf/people/Sandra> -> bestFriend -> <http://www.xulplanet.com/rdf/people/Christine>
The 'bestFriend' label, and others used above is called the predicate. The left-hand side of each triple is called the subject and the right-hand side is called the object or target. The subject will always be a resource, while the target may be a resource or a literal. You will never have a literal to literal relationship.
To ensure uniqueness, predicates are also identified with URIs. There will be a unique URI for each predicate. For example, the URI for 'sibling' might be <http://www.xulplanet.com/rdf/people/sibling>. Usually, related predicate labels will be similar so that XML-style namespaces may be used to refer to them. So really, the last example would be the following:
<http://www.xulplanet.com/rdf/people/Sandra> -> <http://www.xulplanet.com/rdf/people/bestFriend> -> <http://www.xulplanet.com/rdf/people/Christine>
For the purposes of this documentation, we will sometimes leave out the predicate namespace to simplify the examples. Predicates are also resources and you could associate relationships between a predicate and something else. This is commonly done to define properties that describe what a predicate means, forming a formal vocabulary for that predicate.
You can have multiple triples with the same subject and predicate if you wish:
<http://www.xulplanet.com/rdf/people/Sandra> -> name -> Sandra <http://www.xulplanet.com/rdf/people/Sandra> -> name -> Sandy
Here, the same resource has a triple with the same predicate but with a different target. In this example, we might assume that multiple names were intended to imply nicknames. RDF doesn't order the names, so either name has equal prominence. Don't assume that because Sandra is the first triple written that it has any more significance than Sandy.
Let's say that we decide to add another person also named Sandra:
<http://www.xulplanet.com/rdf/people/SandraJr> -> name -> Sandra
The resource is different than in the earlier examples because it represents a different person, however they both have the same name. With most RDF APIs, one can query either way. If all you have is the name 'Sandra', you can get the resources with that name by querying in reverse.
Sometimes it is useful to be able to identify what kind of thing a resource is, much like how object-oriented systems use classes for this purpose. RDF uses a type for this purpose. While there are two very general types, a resource and a literal, every resource may be given a precise type. For example, the Sandra resource might be given a type of 'Person'. The value of the type should be another resource which would mean that more information could be associated with the type itself.
As with other properties, types are also specified with a triple:
<http://www.xulplanet.com/rdf/people/Sandra> -> rdf:type -> <http://xmlns.com/wordnet/1.6/Person>
The resource <http://xmlns.com/wordnet/1.6/Person> is used to represent a person. The URI is from the WordNet, which provides resource URIs for words. The predicate is rdf:type, which is in the RDF namespace since the 'type' predicate is built-in to RDF. The full name is 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'. Here, we use 'rdf:' as an abbreviation for the RDF namespace, as would be done in XML. This is only done to simplify the example -- the predicate is always the full name with namespace.
We can also make up our own types. In the example below we associate the type <http://www.xulplanet.com/rdf/example/Poem> as the type of a resource, which judging from the URI might be a poem.
<http://www.xulplanet.com/rdf/something/785> -> rdf:type -> <http://www.xulplanet.com/rdf/example/Poem>
Some List Types
RDF has a number of built-in types for representing lists of things. Recall an earlier example where Sandra had two names. It was mentioned that the names were not in any particular order. Sometimes, it may be useful to be able to put a set of values in a particular order. You might be able to think of ways to work around this, for example by using predicates named 'name1', 'name2' and so on. RDF has a built-in mechanism for doing this kind of thing.
In RDF, a predicate that is a number preceded with an underscore is used as an item in a list. For example, rdf:_1 is used to indicate the first item in a list. We again use the RDF namespace. For example, we could create a list of things using:
<http://www.xulplanet.com/rdf/people/Karen> -> rdf:_1 -> <http://www.xulplanet.com/rdf/people/Sandra> <http://www.xulplanet.com/rdf/people/Karen> -> rdf:_2 -> <http://www.xulplanet.com/rdf/people/Kevin> <http://www.xulplanet.com/rdf/people/Karen> -> rdf:_3 -> <http://www.xulplanet.com/rdf/people/Jack>
Here, <http://www.xulplanet.com/rdf/people/Karen> has three items in the list. We, as humans, could presume that the three items were Karen's children, The computer system won't know this, but your application could make this assumption. The three predicates aren't special in any way, as we could have used 'name1', 'name2' and 'name3'. However, since RDF has the predicates above built-in we should use that if possible.
Since rdf:_XXX are just predicates like any other, we could also specify multiple values with the same number, or we could miss same values out:
<http://www.xulplanet.com/rdf/people/Karen> -> rdf:_1 -> <http://www.xulplanet.com/rdf/people/Sandra> <http://www.xulplanet.com/rdf/people/Karen> -> rdf:_6 -> <http://www.xulplanet.com/rdf/people/Kevin> <http://www.xulplanet.com/rdf/people/Karen> -> rdf:_6 -> <http://www.xulplanet.com/rdf/people/Billy> <http://www.xulplanet.com/rdf/people/Karen> -> rdf:_8 -> <http://www.xulplanet.com/rdf/people/Jack>
The above is a list with four items in it. However, the numbers are not sequential. Creating triples like this isn't done often, but a list where numbers are skipped might be found when an item has been removed.
In order for RDF to treat these numbered predicates in any special way -- remember that they are just ordinary predicates -- RDF also requires that you use a special type for lists. Several list types are available:
- rdf:Seq: an ordered list, which is what we would use for the examples above.
- rdf:Bag: an unordered list.
- rdf:Alt: an list of alternate values where only one value is expected to be used.
In the examples above, we would use rdf:Seq because we want the items to be in a specific order. We would use rdf:Bag if it didn't matter what order the items were in. This might not seem any different that just using multiple 'name' predicates, however, if the type is rdf:Bag we know that the resource is expected to contain a list of names, instead of just one. For names, we might instead use an Alt since an application will only need to use one of the names at a time.
We assign these list types just like any other type, as described earlier:
<http://www.xulplanet.com/rdf/people/Karen> -> rdf:type -> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq>
Here, the full namespace for 'rdf' is used for the target value. Now that Karen is an rdf:Seq, we can add the three children as above. One problem here is that now that Karen is of the type rdf:Seq, she isn't of the type Person. We could solve this by assigning a second type to Karen, as we could with other predicates. A better way is to use a second resource as a placeholder for Karen's list of children. Karen will remain a Person, but the list of children will be an rdf:Seq.
<http://www.xulplanet.com/rdf/people/Karen> -> rdf:type -> <http://xmlns.com/wordnet/1.6/Person> <http://www.xulplanet.com/rdf/people/Karen> -> children -> <http://www.xulplanet.com/rdf/people/KarensKids> <http://www.xulplanet.com/rdf/people/KarensKids> -> rdf:type -> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq> <http://www.xulplanet.com/rdf/people/KarensKids> -> rdf:_1 -> <http://www.xulplanet.com/rdf/people/Sandra> <http://www.xulplanet.com/rdf/people/KarensKids> -> rdf:_2 -> <http://www.xulplanet.com/rdf/people/Kevin> <http://www.xulplanet.com/rdf/people/KarensKids> -> rdf:_3 -> <http://www.xulplanet.com/rdf/people/Jack>
We have made Karen a Person, and given her a relationship to a resource <http://www.xulplanet.com/rdf/people/KarensKids> via the predicate 'children'. Instead of associating the three children directly with Karen, we instead associate them with this extra resource. This extra resource has a type of rdf:Seq. The result is that Karen is given three children but can still have a unique Person type.
In RDF terms, we don't have to explicitly specify the URI of <http://www.xulplanet.com/rdf/people/KarensKids>. Since the pattern above is commonly used, RDF allows the use of blank nodes or anonymous resources to be used. An RDF API will allow you to create these nodes, and usually a URI might be randomly generated for you. Technically, they don't have URIs, but you can still manipulate them as resources, and add and remove triples associated with them.
In all of the examples above, we have only ever been defining triples. Each line has a subject, a predicate and an object. Even when defining types and lists, we still use triples to define them. RDF is only a list of triples, or relationships between things.
One interesting thing about RDF is that we could take a list of triples from one source and combine them with triples from another source. Since the order doesn't matter, this would have the same effect as if we had specified them together to begin with. For instance, let's assume another source provided the following triple:
<http://www.xulplanet.com/rdf/people/KarensKids> -> rdf:_4 -> <http://www.xulplanet.com/rdf/people/Wendy>
When combined with the earlier example, this would mean that Karen would now have four children instead of three. This is an important aspect of RDF -- being able to combine or aggregate data from multiple sources together.
In the context of a browser, the bookmarks may be stored as a series of RDF triples.
<urn:x-mark:1> -> Name -> XULPlanet.com <urn:x-mark:1> -> URL -> <http://www.xulplanet.com> <urn:x-mark:1> -> LastVisited -> Sept 8, 2020 <urn:x-mark:2> -> Name -> mozilla.org <urn:x-mark:2> -> URL -> <http://www.mozilla.org>
This example has five triples describing two bookmarks. Each bookmark has a Name and a URL, and one of them has a LastVisited predicate. The URL values are set to another resource, which means that they could be used in other triples.
If you have a variety of different types of bookmarks, you might create a type for each one. For example, your application might have a concept of a subscription stored as a bookmark, so you might use a type for it:
<urn:x-mark:1> -> rdf:type -> http://www.xulplanet.com/rdf/example/Subscription
You could use the convention that a bookmark that was a Seq is a bookmark folder. The children of the Seq would be the bookmarks stored in the folder.
<urn:x-mark:folder> -> rdf:type <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq> <urn:x-mark:folder> -> rdf:_1 <urn:x-mark:1> <urn:x-mark:folder> -> rdf:_2 <urn:x-mark:2>
This is only an example of the kinds of things that can be stored with RDF. Almost any kind of data can be handled with RDF.
It is available at http://www.geocities.com/pan_andrew/ResourceDescriptionFramework.htm