Wednesday, February 17, 2010

Beyond the SPARQL Protocol

The SPARQL Protocol has done a lot to bring different RDF stores together and make interoperability possible. However, the SPARQL Protocol does not encompass all operations that are typical of an RDF store. Below are some ideas, that would extend the protocol enough that it could become a general protocol for RDF store interoperability.

One common complaint is the lack of direct support for graphs. This is partly addressed in the upcoming SPARQL 1.1, which includes support for GET/PUT/POST/DELETE on named graphs. However, it is still missing the ability to manage these graphs. What is still needed is a way to assign a graph name to a set of triples as well as a vocabulary to search and describe the available graphs. The service could accepted a POST request of triples and responded with the created named graph to support construction. The graph metadata could be available in a separate service or as part of the existing SPARQL service, made available via SPARQL queries.

The use of POST in SPARQL ensures serializability of client operations. However, it prevents HTTP caching (with reasonably sized queries), which is necessary for Web scalability. This can be rectified by introducing standard named query support. By providing the client with the ability to create and manage server side queries (with variable bindings), many common operations can become cachable. These named queries can be described in their own service or as part of the existing SPARQL service. The named query metadata would include optional variable bindings and cache control settings. The queries could then be evaluated on HTTP GET to the URI of the query name, using the configured cache control, enabling Web scalability.

Another requirement for broad RDF store deployments is the ability to isolate changes. Many changes are directly dependent on a particular state of the store and cannot be represented in a update statement. Although SPARQL 1.1 allows update statements to be dependent on a graph pattern, many changes have indirect relationships to the store state and cannot be related directly within a WHERE clause.

To accommodate this form of isolation, separate service endpoints are needed to track the observed store state and the triples inserted/deleted. Metadata about the various available endpoints could be discoverable within each service (or through a dedicated service). This metadata could include such information as the parent service (if applicable) and the isolation level used within the endpoint.

To support serializable isolation, each endpoint would need to watch for Content-Location: headers, which would indicate the source of the update statement in the POST requests. When such a update occurs, the service must validate that the observed store state in the source endpoint is the same as the store state in the target endpoint before proceeding.

By standardizing graph, query, and isolation vocabularies within the SPARQL protocol, RDF stores would be much more appealing to a broader market.


Reblog this post [with Zemanta]

Tuesday, February 9, 2010

RDFa Change Sets

With so many sophisticated applications on the Web, the key/value HTML form seems overly simplistic for today's Web applications. The browser is increasingly being used to manipulate complex resources and an increasingly popular technique for encoding sophisticated data in HTML is RDFa.

RDFa defines a method of encoding data within the DOM of an HTML page using attributes. This allows complex data resources to be connected to the visual aspects that are used to represent them. RDFa provides a standard way to convert an HTML DOM structure into RDF data for further processing.

Instead of encoding your data in a key/value form, encode your data in RDFa and use DHTML and AJAX to manipulate the DOM structure and in turn manipulate the data. The conversion from HTML to data can be done on the server or client using existing libraries.

There are a few ways that RDFa can help with communication to the server. The simplest would be to send back the entire HTML DOM for RDFa parsing on the server. However, an HTML page might contain an excessive amount of bulk and therefore this would not be appropriate as a general solution. Instead, using an RDFa parser on the client, the resulting RDF data can be sent to the server, ensuring only the data is transmitted back. This would reduce excessive network traffic and move some of the processing to the client.

In a recent project, we went further and used rdfquery to parse before and after snapshots on the client to prepare a change-set for submission back to the server. In JavaScript, the client prepared an RDF graph of removed relationships and properties and an RDF graph of added relationships and properties. These two graphs represent a change-set. By using change-sets throughout the stack, enforcing authorization rules and tracking provenance became much more straight-forward. Change-sets also gave more control over the transaction isolation level, by enabling the possibility of merging (non-conflicting) change-sets. Creating change-sets at the source (on the client) eliminated the need to load/compare all properties on the server, making the process more efficient and less fragile.

RDFa on the client and submitting change-sets can help streamline data processing and manipulation and avoid much of the boilerplate code associated with mapping data from one format to another.

Reblog this post [with Zemanta]