Monday, June 1, 2009

Standard RDF Protocol

The SPARQL Working Group's first meeting has come and gone. Part of that meeting discussed the scope of what should be undertaken as part of SPARQL2, including what features should be added to the protocol. SPARQL is a standard query language and protocol.

I am a big fan of standards, as most people are in the Semantic Web community. However, I feel that a database protocol does not provide much value and in fact, should be tailored to the evaluation and storage mechanism used by the implementation.

Standards are intended to be interoperable between implementations, and SPARQL must abstract away from the storage mechanism used in order to achieve the desired level of interoperability. This leads SPARQL to result in a less than efficient query/retrieval operation, when compared to storage specific mechanisms like SQL. However, this still has significant advantages, as a single query can be used with a wide array of storage mechanisms and remains unchanged between significant storage alterations. This does come at a cost to both the ability to create efficient queries and the ability to evaluate queries effectively using the SPARQL protocol.

The cost of query parsing and optimization can always be improved and can be tailored to specific models, without any loss of interoperability at the query language level. However, a standard protocol cannot be optimized the same way.

When a database backed application has performance problems, 9 out of 10 times it is due to excessive communication between the database and its clients. Therefore, I question the value in abstracting this communication protocol away from the storage/evaluation mechanism, because this will exacerbate communication overhead.

While I think the SPARQL query language is developing nicely and should be considered in any project that wants to ease the maintainability of their queries, I also think any project that is concerned about the performance of their queries should consider using a proprietary protocol to optimize its communication with a database.

Reblog this post [with Zemanta]

No comments:

Post a Comment