Time to chime in on the
There are four general ways of storing information:
- A list, in which one has a number of items, which may or not
be related to one another.
- A table, in which one has a number of items (records), each with
a distinct set of properties or columns.
- A tree, in which one has a hierarchy of items.
- A graph, in which one has a number of items (nodes), with the nodes
connected to each other in some way.
There are others, but they are more or less just variations of the same.
There are examples all over of each type. Arrays are examples of lists. Of
course, they are used all over the place. Relational databases typically
store all of their data in tables. So do spreadsheets. Trees are used for
mail or news messages and your bookmarks. XML is a syntax for specifying
trees of information. The Windows and Classic Macintosh file systems are
presented and/or stored as a tree. The Unix file system however isn't a
tree. It's a graph. RDF is a graph. The Web is also a graph -- it's a
bunch of pages connected via links.
Each of the four storage methods, lists, tables, trees, and graphs,
increase in complexity as you go up. Lists are simple to store. Graphs are
the most difficult. Actually, that doesn't need to be the case. But, very
few programming languages come with any kind of Graph structure ready to use.
Due to the complexity, you should probably store data in the lowest type
possible, depending on the kind of data you have.
You can always use one of the structures higher than what is necessary.
A list could be stored in a table with only one column, a table can be
stored in a tree, where a root node has a set of records, each with a
set of properties, and a tree is really a specialized form of graph.
However, the reverse is not true. You can't store a graph in a tree, you
can't store a tree in a table, and you can't store a table in a list.
Any place where you see someone trying to is a hack.
Many people don't know this though. So they just store everything in a
tabular database or in XML, regardless of what it is. This has two problems.
First, you get data that can be stored in a simpler format, stored in some
more complex format. So you get people passing lists of things around using
XML. Or, configuration files stored in XML.
Second, you get people trying to coerce more complex data into a simpler
format, so you might see people trying to shove trees of data into a database.
Or you get
serialized RDF written as XML.
Many people think that XML is the ultimate format for storing data. It isn't.
It can represent trees nicely, and it can do tables and lists if you really
wanted it to, but it can't represent graphs, not cleanly anyway. Perhaps
what is needed is an eXtensible Graph Language, which represents graphs
of data. There is RDF-XML, and
XGMML but both use a
language for describing trees. Actually, it shouldn't be called the
eXtensible Graph Language, because then people will get confused thinking
it's like XML. Because a tree can be represented as a graph, all data
could be represented in the Graph Language (not that it should be, of course),
unlike XML which can't. Of course, this assumes there isn't some higher
level structure above the graph.
Long, long ago, people stored data in lists, because that was all that
was available. Then, someone came up with the idea of storing data in
tables. So relational databases came along and people moved up the ladder
to tables. A few years ago, XML came along so data moved up again to trees.
Can you guess what will happen next? The Semantic Web folks want us to
move to using graphs. Should we move to graphs? Seems to be the next
logical step in information evolution. What's holding us back? Well,
it's probably too soon. The world is still in the tree phase. One day,
graphs will start to become more popular -- it will just take time.
In 30 years, someone might come up with something beyond graphs, and
we'll all slowly switch to it as well.
There's also the RSS in RDF debate. Many people don't see the value in
storing RSS data in RDF. This is because the information stored in a
single RSS file isn't a graph -- it's a tree, so plain-old XML actually
makes more sense.
Of course, the Semantic Web folks don't agree. Why? Because they aren't
thinking in terms of a single RSS file - they are thinking of building
giant collections of RSS data, all linked together so that it forms
one giant - hey, it's not a tree - it's a graph. Then, you can search
and navigate it like you can with the existing Web.
But of course, the Semantic Web lets the servers and the software you're
using, know more about what you're talking about. This is unlike current
popular search engines like Google which are pretty much just guessing.
You can make it better, sure, but the best way to acheive accuracy is if
someone tells it the answer to begin with.
Anyway, I'm starting to wander a bit -- that means it's a good time to stop.