Neil's Place

May 23, 2003

1:01 PM Data Structures and RDF

Time to chime in on the RDF debate.

There are four general ways of storing information:

  1. A list, in which one has a number of items, which may or not be related to one another.
  2. A table, in which one has a number of items (records), each with a distinct set of properties or columns.
  3. A tree, in which one has a hierarchy of items.
  4. A graph, in which one has a number of items (nodes), with the nodes connected to each other in some way.

There are others, but they are more or less just variations of the same.

There are examples all over of each type. Arrays are examples of lists. Of course, they are used all over the place. Relational databases typically store all of their data in tables. So do spreadsheets. Trees are used for mail or news messages and your bookmarks. XML is a syntax for specifying trees of information. The Windows and Classic Macintosh file systems are presented and/or stored as a tree. The Unix file system however isn't a tree. It's a graph. RDF is a graph. The Web is also a graph -- it's a bunch of pages connected via links.

Each of the four storage methods, lists, tables, trees, and graphs, increase in complexity as you go up. Lists are simple to store. Graphs are the most difficult. Actually, that doesn't need to be the case. But, very few programming languages come with any kind of Graph structure ready to use. Due to the complexity, you should probably store data in the lowest type possible, depending on the kind of data you have.

You can always use one of the structures higher than what is necessary. A list could be stored in a table with only one column, a table can be stored in a tree, where a root node has a set of records, each with a set of properties, and a tree is really a specialized form of graph.

However, the reverse is not true. You can't store a graph in a tree, you can't store a tree in a table, and you can't store a table in a list. Any place where you see someone trying to is a hack.

Many people don't know this though. So they just store everything in a tabular database or in XML, regardless of what it is. This has two problems. First, you get data that can be stored in a simpler format, stored in some more complex format. So you get people passing lists of things around using XML. Or, configuration files stored in XML.

Second, you get people trying to coerce more complex data into a simpler format, so you might see people trying to shove trees of data into a database. Or you get serialized RDF written as XML.

Many people think that XML is the ultimate format for storing data. It isn't. It can represent trees nicely, and it can do tables and lists if you really wanted it to, but it can't represent graphs, not cleanly anyway. Perhaps what is needed is an eXtensible Graph Language, which represents graphs of data. There is RDF-XML, and XGMML but both use a language for describing trees. Actually, it shouldn't be called the eXtensible Graph Language, because then people will get confused thinking it's like XML. Because a tree can be represented as a graph, all data could be represented in the Graph Language (not that it should be, of course), unlike XML which can't. Of course, this assumes there isn't some higher level structure above the graph.

Long, long ago, people stored data in lists, because that was all that was available. Then, someone came up with the idea of storing data in tables. So relational databases came along and people moved up the ladder to tables. A few years ago, XML came along so data moved up again to trees. Can you guess what will happen next? The Semantic Web folks want us to move to using graphs. Should we move to graphs? Seems to be the next logical step in information evolution. What's holding us back? Well, it's probably too soon. The world is still in the tree phase. One day, graphs will start to become more popular -- it will just take time. In 30 years, someone might come up with something beyond graphs, and we'll all slowly switch to it as well.

There's also the RSS in RDF debate. Many people don't see the value in storing RSS data in RDF. This is because the information stored in a single RSS file isn't a graph -- it's a tree, so plain-old XML actually makes more sense.

Of course, the Semantic Web folks don't agree. Why? Because they aren't thinking in terms of a single RSS file - they are thinking of building giant collections of RSS data, all linked together so that it forms one giant - hey, it's not a tree - it's a graph. Then, you can search and navigate it like you can with the existing Web.

But of course, the Semantic Web lets the servers and the software you're using, know more about what you're talking about. This is unlike current popular search engines like Google which are pretty much just guessing. You can make it better, sure, but the best way to acheive accuracy is if someone tells it the answer to begin with.

Anyway, I'm starting to wander a bit -- that means it's a good time to stop.

Comments ( 2 )

May 21, 2003

10:58 PM Some RPath Documentation

OK, I have put together a short list of what I've implemented so far in RPath, which I'm now going to call ReoPath, to give it a less generic sounding name.

Currently, it can be called from within Mozilla using an XPCOM component, or can sort of be used as a standalone utility. I'll provide the code and/or binary soon after I clean up some crashes and all those memory leaks.

Feedback is welcome.

Next, I'll implement some additional functions. I'll also post some info about ReoPath Templates, which is a way to embed ReoPath expressions in a XUL file (or any XML file for that matter), and have content generated, kind of like XUL templates, but more powerful (and at the moment, with more hacks).

Comments ( 35 )

May 14, 2003

4:39 PM Cafe Dream

Last night I had an interesting dream.

I was frantically searching all over town for some of those sticky circle things you're supposed to put around the holes in a sheet of paper. I'm not sure why. I've never used them -- in fact I don't know anyone who has.

Anyway, when I got to an office supplies store, it had turned into a toy store, so I couldn't get any. But the interesting part was that next door, a place had opened called the MSN Cafe. Inside, people were lounging about drinking coffee and talking. On the far wall was a line of PCs one could use. On one side was the counter where people could order various gourmet coffees. Above that, a giant MSN butterfly logo.

Could this be a sign of the future? Is Microsoft planning to open up a chain of coffee shops one day? Will Sun counter with stores with names containing a Java/coffee pun? Who knows?

Comments ( 33 )

May 13, 2003

11:04 PM Now we're getting somewhere

This RPath stuff is really starting to do interesting things:

chain('Meg Ryan'^name,'Christina Ricci'^name,^playedBy*^cast|cast*/playedBy)/name

Meg Ryan was in Anastasia with John Cusack who was in Being John Malkovich with Cameron Diaz who was in Fear and Loathing in Las Vegas with Christina Ricci

OK, there's a tiny bit of JavaScript in there too, to add the 'was ins' and the 'withs' and the bold text.

Comments ( 46 )

May 10, 2003

5:37 AM appendChild

While reworking the semantic web stuff I'm doing to be more interesting, I've noticed that while things are starting to work, is it very slow. Clicking on an actor's name takes almost two seconds before the actor's biography and list of movies they have been in appears.

Half of this slowdown is caused by the DOM appendChild function. So I traced through the code to see why it might be taking a while.

Here is what happens when appendChild is called for a XUL element:

  1. Check the node origin to make sure it can be appended.
  2. Ensure that the node is not an ancestor.
  3. If the node is already is the tree, remove the child first.
  4. Check to ensure that the existing children have been generated already, and build them if not. (Template content for instance)
  5. Append the content to the children array.
  6. Set the parent node.
  7. Inform the XBL system that the document for the elementis changing, if necessary.
  8. Set the document of the node.
  9. Check all attributes and if there are event attributes, hook up each event handler, compiling the script if needed.
  10. Repeat the last three steps recursively for all child nodes of the node being appended.
  11. Check if any DOM mutation listeners have been added to the window and would need to be notified of the appended node.
  12. If there are mutation listeners, fire a mutation event.
  13. Add the element's id to a hash, for later use of getElementById.
  14. If the element has a commandupdater attribute, hook it up.
  15. Check if the element is an observes element, and look up the corresponding broadcaster and hook the observer up to it.
  16. If the element isn't an observer, check for an observes attribute, and hook it up to the broadcaster.
  17. Check if the element is a keyset and hook up a key handler as needed.
  18. Check if the element has a datasources attribute, and if so load the datasource and handle all the template related stuff.
  19. Repeat steps 13-18 recursively for all child nodes.
  20. Check if the parent has an XBL binding.
  21. If so (most XUL elements will), get the insertion points, which may nested deeper in the tree.
  22. Look up the anonymous content for the element.
  23. Loop over the insertion points and find the appropriate one to use.
  24. Add the content at that point.
  25. Notify any document observers that the node had been appended.

Anyway, that might explain some of the delay, just in case you thought it was as simple as adding something to an array.

Comments ( 30 )

May 9, 2003

7:25 AM Interesting XUL things.

You might notice some unusual things about these XUL examples:

  1. <button ref="NC:BookmarksRoot" label="OK"
  2. <button id="NC:HistoryRoot" label="OK" oncommand="alert(this.resource);"/>
  3. <tree id="urn:animals:data" datasources="animals.rdf">

Comments ( 9 )

May 8, 2003

4:01 AM From Hyatt

I plan to implement XBL and the XUL layout primitives in Safari eventually - Dave Hyatt

Comments ( 6 )

May 1, 2003

7:15 PM RPath Lives!


The question is, can you figure out what the expression above does? If so, then RPath passes the Syntax Is Comprehensible Test.

Comments ( 27 )