Tuesday, March 31, 2009

Long Running Transactions

I have been experimenting recently with long running transactions and am intrigued by its potential. The term 'long running transaction' is loaded with interpretations, so let me explain. Normally, when one thinks of a long running transaction, they think of a set of operations that can be undone. The implications of which are domain/implementation specific.

However, I am experimenting with atomic optimistic long running transactions - like a database transaction that spans multiple HTTP requests. The key requirement here is that it is optimistic. This means there are no resource locks used across the requests, allowing concurrent access and non-conflicting modification.

What makes this interesting, is when using a web interface to manipulate a complex structure. While other approaches would have the structure copied into session memory or track undo operations, this utilizes the indexes already present in the store (RDF in this case) and there is no need to track or even develop complicated undo behaviour. We are all familiar with the classic cancel/apply or restore/save buttons of local applications that span an application or set of dialogs. With optimistic long running transactions, this same experience can be used in a web application.

Reblog this post [with Zemanta]

Thursday, March 19, 2009

Eight Isolation Levels Every Web Developer Should Know

The ACID properties are one of the cornerstones of database theory. ACID defines four properties that must be present if a database is considered reliable: Atomicity, Consistency, Isolation, and Durability. While all four properties are important, isolation in particular is interpreted with the most flexibility. Most databases provide a number of isolation levels to choose from, and many libraries today add additional layers which create even more fine-grained degrees of isolation. The main reason for this wide range of isolation levels is that relaxing isolation can often result in scalability and performance increases of several orders of magnitude.

Read More Here:
http://www.infoq.com/articles/eight-isolation-levels

Reblog this post [with Zemanta]

Monday, March 9, 2009

RDF Federation

An RDF store (unlike an rdbms store) usually indexes everything. This means that the store's indexes are often larger then the data that it contains. Such large indexes makes I/O expensive and caching difficult. Scaling vertically by adding more memory for caching and faster disks can make a big difference, but it can get very expensive very quickly.

The alternative is to scale horizontally. This can be done in one of two ways: by mirroring the indexes on other machines, or by partitioning the indexes to other machines. The first option, called clustering, can reduce the I/O load, but will still have difficult caching it. The second option, called federating, reduces the I/O load on each machine, and can allow each machine to specialize, making caching much more effective.

Federating RDF stores is now going to be a lot easier with Sesame 3.0. Sesame 3.0 will support federating multiple (distributed) Sesame Repositories into a unified store. This allows large indexes to be distributed on multiple machines that are connected over a network. The federation supports multiple ways of partitioning the data. It can be partitioned by predicate (property), by subject, or both. When properly setup the federation can effectively proxy queries to the specialized members and join queries among the distributed members. For large-scale RDF stores, federating is becoming a valuable solution for RDF architecture.

Instructions for setting up a read only Sesame Federation can be found here:
https://wiki.aduna-software.org/confluence/display/SESDOC/Federation

Reblog this post [with Zemanta]

Monday, March 2, 2009

What is the Future of Database Systems?

Last month saw a flurry of activity around the future of relational databases. Although no one can predict the future, I think it is safe to say that many system developers/architects are hungry for mainstream semi-structure databases.

Feb 12 Is the Relational Database Doomed?
Feb 13 Is The Relational Database Doomed?
Feb 13 CouchDB could be a viable alternative to relational databases for storing patient data
Feb 16 The future of RDBMS's
Feb 20 Is the Relational Database Not an Option in Cloud Computing?
Feb 27 How FriendFeed uses MySQL to store schema-less data

Reblog this post [with Zemanta]