Programming The Web: RDF Federation

Monday, March 9, 2009

RDF Federation

An RDF store (unlike an rdbms store) usually indexes everything. This means that the store's indexes are often larger then the data that it contains. Such large indexes makes I/O expensive and caching difficult. Scaling vertically by adding more memory for caching and faster disks can make a big difference, but it can get very expensive very quickly.

The alternative is to scale horizontally. This can be done in one of two ways: by mirroring the indexes on other machines, or by partitioning the indexes to other machines. The first option, called clustering, can reduce the I/O load, but will still have difficult caching it. The second option, called federating, reduces the I/O load on each machine, and can allow each machine to specialize, making caching much more effective.

Federating RDF stores is now going to be a lot easier with Sesame 3.0. Sesame 3.0 will support federating multiple (distributed) Sesame Repositories into a unified store. This allows large indexes to be distributed on multiple machines that are connected over a network. The federation supports multiple ways of partitioning the data. It can be partitioned by predicate (property), by subject, or both. When properly setup the federation can effectively proxy queries to the specialized members and join queries among the distributed members. For large-scale RDF stores, federating is becoming a valuable solution for RDF architecture.

Instructions for setting up a read only Sesame Federation can be found here:
https://wiki.aduna-software.org/confluence/display/SESDOC/Federation

7 comments:

Daniela KolarovaApril 1, 2010 at 8:00 AM
Federation sounds like a big step further towards structured data distribution in the Internet. When will Sesame 3.0 be officially released?
ReplyDelete
Replies
AnonymousApril 2, 2010 at 8:20 AM
Hi Daniela,

Sesame 3.0 is suspended until further development is made in the SPARQL 1.1 Working Group. However, the federation has been back ported to Sesame 2.3 and is currently shipping with AliBaba 2.0-alpha4.
ReplyDelete
Replies
Daniela KolarovaApril 6, 2010 at 11:25 AM
Thanks for the answer! Is it possible to use the RDF Federation Sail with Sesame 2.3 without using AliBaba?
ReplyDelete
Replies
AnonymousApril 10, 2010 at 7:53 AM
You don't have to use the object-mapper or server of AliBaba to use the federation. Just include the jar and follow the documentation.
ReplyDelete
Replies
NasoMay 13, 2010 at 9:18 AM
Dear James,

federation through data/index partitioning sounds as the universal solution for all sorts of problems in DBMS. Still, the guys in the relational DBMS do use it too much. Any guess why? Any thoughts what is the impact of the speed of query evaluation? Any ideas how to get around the so-called "remote join" problem?

Cheers,
Atanas Kiryakov
ReplyDelete
Replies
NasoMay 13, 2010 at 1:09 PM
I meant "the guys in the relation DBMS do *not* user it too much" - sorry

Atanas Kiryakov
ReplyDelete
Replies
AnonymousMay 14, 2010 at 6:49 PM
Federation in RDBMS is not effective because of the data integrity constrains. RDBMS relies on pessimistic concurrency control, which requires centralization. Federating multiple data sources requires an optimistic concurrency control to scale effectively.

Take a look at Eight Isolation Levels Every Web Developer Should Know for more information on concurrency controls.

Partitioning your data effectively is a skill just as designing an effective table schema. Both can have significant impact on performance and scalability.

A smart, well informed, query optimizer is essential for effective query processing. However, many distribution problems can be solved with minimal remote cross joins.
ReplyDelete
Replies

Add comment

Programming The Web

Monday, March 9, 2009

RDF Federation

7 comments:

About Me

Blog Archive

Other Blogs

Programming The Web

Monday, March 9, 2009

RDF Federation

7 comments:

About Me

Blog Archive

Other Blogs

Subscribe