Monday, March 8, 2010

Reinventing RDF Lists

Last month the SW interest group discussed alternatives to containers and collections as part of a discussion around what the next generation of RDF might look like. Below is my opinion on the matter.

RDF's simplistic approach makes it possible to encode most data structures, both simple and complex. The challenge people have with RDF, coming from other Web formats, is the lack of basic ordered collections (a concept common in XML). In RDF you are forced into a linked list structure just to preserve resource order. The linked list structure known as rdf:List is difficult to work with and highly ineffective within modern RDF stores.

Most RDF formats provide syntactic sugar to make it easier to write rdf:Lists. In turtle this is done using round brackets (parentheses); in RDF/XML this is done using the parseType collection attribute. However, because rdf:List is not a fundamental concept in RDF, no RDF store implementation preserves them, instead opting to use the fundamental triple form -- a linked list.

RDF is made of the following fundamental concepts: URI, Literal, and Blank Node. A fundamental list concept should be added to make it easier and more efficient to work with ordered collections. This would not have a significant effect on RDF formats, as their syntax would not change, but would have a significant impact on the mindset of RDF implementers.

With this change RDF implementers would strive to ensure that lists are implemented efficiently and provide convenient operations on them, just as they would other fundamental RDF concepts. The triple (linked list) form should be kept for compatibility with RDF systems that don't preserve lists, but the goal would be that RDF systems would not be obligated to provide a triple linked list form that has proven to be ineffective.

By making lists a fundamental RDF concept, there is no required change for RDF libraries to continue to be compatible with existing standards. Most libraries and systems may already understand list short hand and some may also preserve it.

Reblog this post [with Zemanta]

Monday, March 1, 2010

Improving RDF/XML Interoperability

All the permitted variations in RDF/XML make working with it using XML tools difficult, at best. Most of the time assumptions are made about the structure of the RDF/XML document. These are based on particular RDF/XML implementations. However, there is no standard spec that says what this simplified structure should be. The next generation of RDF specs should correct this and create a subset of RDF/XML for working with XML tools.

A good place to start is with the document Leo Sauermann has started. I like the design, but feel the rules could be improved, based on my experience.

The design of SimpleRdfXml, as proposed by Leo, is:
1. be compatible with RDF/XML
2. but only a subset
3. restrict to simplicity

The rules I try to follow when serializing RDF/XML for use with XSLT are:

1. No nested elements (references to resources must be done via rdf:resource or rdf:nodeID).
2. No property attributes.
3. All blank nodes identified by rdf:nodeID.
4. Only full URIs, no relative URIs.
5. No type node elements, always as rdf:type element.
6. The same rdf:about maybe repeated on multiple nodes.
7. Never use rdf:ID.
8. Always use rdf:li when possible.
9. Always use rdf:parseType="Collection" when possible.
10. All rdf:XMLLiterals written as rdf:parseType="Literal".
11. Never use rdf:parseType="Resource".
12. White-space is preserved within literal tags.

By standardizing on these (or another RDF/XML subset), interoperability between XML and RDF tools becomes possible. This allows existing shops to reuse their current XML skills to work with RDF, easing their transition.

Reblog this post [with Zemanta]