Wednesday, February 17, 2010

Beyond the SPARQL Protocol

The SPARQL Protocol has done a lot to bring different RDF stores together and make interoperability possible. However, the SPARQL Protocol does not encompass all operations that are typical of an RDF store. Below are some ideas, that would extend the protocol enough that it could become a general protocol for RDF store interoperability.

One common complaint is the lack of direct support for graphs. This is partly addressed in the upcoming SPARQL 1.1, which includes support for GET/PUT/POST/DELETE on named graphs. However, it is still missing the ability to manage these graphs. What is still needed is a way to assign a graph name to a set of triples as well as a vocabulary to search and describe the available graphs. The service could accepted a POST request of triples and responded with the created named graph to support construction. The graph metadata could be available in a separate service or as part of the existing SPARQL service, made available via SPARQL queries.

The use of POST in SPARQL ensures serializability of client operations. However, it prevents HTTP caching (with reasonably sized queries), which is necessary for Web scalability. This can be rectified by introducing standard named query support. By providing the client with the ability to create and manage server side queries (with variable bindings), many common operations can become cachable. These named queries can be described in their own service or as part of the existing SPARQL service. The named query metadata would include optional variable bindings and cache control settings. The queries could then be evaluated on HTTP GET to the URI of the query name, using the configured cache control, enabling Web scalability.

Another requirement for broad RDF store deployments is the ability to isolate changes. Many changes are directly dependent on a particular state of the store and cannot be represented in a update statement. Although SPARQL 1.1 allows update statements to be dependent on a graph pattern, many changes have indirect relationships to the store state and cannot be related directly within a WHERE clause.

To accommodate this form of isolation, separate service endpoints are needed to track the observed store state and the triples inserted/deleted. Metadata about the various available endpoints could be discoverable within each service (or through a dedicated service). This metadata could include such information as the parent service (if applicable) and the isolation level used within the endpoint.

To support serializable isolation, each endpoint would need to watch for Content-Location: headers, which would indicate the source of the update statement in the POST requests. When such a update occurs, the service must validate that the observed store state in the source endpoint is the same as the store state in the target endpoint before proceeding.

By standardizing graph, query, and isolation vocabularies within the SPARQL protocol, RDF stores would be much more appealing to a broader market.


Reblog this post [with Zemanta]

No comments:

Post a Comment