Thursday, February 26, 2009

RDF Transaction Isolation

Transaction Isolation in relational databases (for better or worse) is well established. However, the issue of transaction isolation is rarely documented in RDF stores.

The ANSI SQL isolation definitions are UPDATE (write) oriented and do not capture the general use case of RDF, which has no notion of UPDATE. For example, the first ANSI SQL phenomenon, dirty-write, is not even applicable to RDF transaction. Another phenomenon, non-repeatable reads, is defined by records retrieved by a SELECT statement. However, RDF queries (unlike SQL) are pattern based and the results don't have a direct relationship to any internal data record.

Relational database isolation mechanisms do not perform nearly as well when INSERT/DELETE operations are used instead of UPDATE. Furthermore, relational databases often have a lax definition of "serializable", allowing conflicting INSERT operations (assuming that preventing conflicting UPDATE operations is sufficient).

RDF is a different beast altogether. RDF is set oriented. Two RDF transactions adding or removing the same statement do not necessarily conflict with each other, as they would in SQL, because a successful add or remove operation in RDF does not require a state change.

Early RDF use cases required full serializable transaction as many of the inferencing rules used in RDF needed to take the complete store state into account. Because of this, RDF stores generally only provide full serializable transactions. However, full serializable transactions often do not perform as well as lower isolation levels.

RDF stores are now being used in environments that have a much greater, real-time demand, for fast concurrent write operations. These environments don't require full serialization, but currently lack any other isolation levels to choose from.

To address this need Sesame 3.0 introduces five isolation levels that will allow RDF stores to vary the isolation level provided. By providing different levels, significant performance improvements can be made for lower isolation levels. For example:

• Read Committed isolation level permits weak-consistency and allows proxies to cache repeated results without validation.
• Snapshot isolation level permits eventual-consistency and allows store clusters to maintain independent state and propagate the changes during idle periods.
• Serializable isolation provides a higher degree of isolation, but does not require atomic consistency, permitting concurrent transactions.

For more details on the isolation levels supported by Sesame 3.0 see:
http://wiki.aduna-software.org/confluence/display/SESDOC/TransactionIsolation

What variations of transaction isolation have you used in your application?

Reblog this post [with Zemanta]

No comments:

Post a Comment