Friday, July 31, 2009

SPARQL Federation and Quints

There are currently a couple popular way to federate sparql endpoints together:

1) In Jena the service must be explicitly part of the query, and therefor the model,

2) In Sesame the basic query patterns must be associated with one or more endpoints before evaluating the query, or

3) Hack the remote query into a graph URI: http://gearon.blogspot.com/2009/05/federated-queries-long-time-ago-tks.html

Although both can be used to achieve the same results, Jena's solution puts more responsibility in the data model, and Sesame's put more responsibility in the deployment. Both have their trade offs, but I believe the query is suppose to be abstracted away from underlying services. The domain model (and therefore the queries) should not be aware of how the data is distributed (or stored) across a network. Therefore, I prefer to describe which graph patterns and relationships are available at each endpoint during deployment and make the application model independent of available service endpoints.

Furthermore, I think it is a bit silly to add yet another level of complexity to the basic query pattern. Adding the service level turns the basic query pattern from a quad to a quint.

To fully index a quint (with support for a service variable, which Jena does not support) would take 13 indexes (nearly double what a quad requires). Below is a table of some complexity levels and how many indexes they require to be fully indexed (variables could appear in any position within the pattern). I have included a theoretical sext that would allow you to group services in a network (just as graphs can be grouped in a service).
Level#ofIdxTermData Structure
double2subjectdirected graph
triple3predicatelabelled directed graph
quad7graphmultiple labelled directed graphs
quint13servicereplicated multiple labelled directed graphs
sext25networktrusted replicated multiple labelled directed graphs
Switching from triples to quad provides a big functionality leap (the ability to refer to an entire graph as a single resource). However, I question how much functionality a quint (or a sext) has over a quad. Couldn't the same functionality be put into a property of the graph (or embedded in the graph's URI authority). An inferencing engine/query could also conclude graph relationships like (subGraphOf), which would still allow a large, but precise, collection of graphs to be queried more effectively.

Hopefully, this topic will have more time to mature before the SPARQL working group makes any official decisions on the matter.

No comments:

Post a Comment