Showing posts with label RDF. Show all posts
Showing posts with label RDF. Show all posts

Sunday, November 28, 2010

Status Code 200 vs 303

The public LOD has been dominated by discussions on using 303 in response to a GET request for distinguishing between the requested resource identifier, and a description document identifier.

Some resources can be represented completely on the Web. For these resources, any of their URLs can be used to identify them. This blog page, for example, can be identified by the URL in a browser's address bar. However, some resources cannot be completely viewed on the Web - they can only be described on the Web.

The W3C recommends responding with a 200 status code for GET requests of a URL that identifies a resource which can be completely represented on the Web (an information resource). They also recommend responding with a 303 for GET requests of a URL that identifies a resource that cannot be completely represented on the Web.

Popular Web servers today don't have much support for resources that can't be represented on the Web. This creates a problem for deploying (non-document) resource servers as it can be very difficult to set-up resources for 303 responses. The public LOD mailing list has been discussing an alternative of using the more common 200 response for any resource.

The problem with always responding to a GET request with a 200 is the risk of using the same URL to identify both a resource and a document describing it. This breaks a fundamental Web constraint that says URIs identify a single resource, and causes URI collisions.

It is impossible to be completely free of all ambiguity when it comes to URI allocation. However, any ambiguity can impose a cost in communication due to the effort required to resolve it. Therefore, within reason, we should strive to avoid it. This is particularly true for Web recommendation standards.

URI collision is perhaps the most common ambiguity in URI allocation. Consider a URL that refers to the movie The Sting and also identifies a description document about the movie. This collision creates confusion about what the URL identifies. If one wanted to talk about the creator of the resource identified by the URL, it would be unclear whether this meant "the creator of the movie" or "the editor of the description." Such ambiguity can be avoided using a 303 for a movie URL to redirect to a 200 of the description URL.

As Tim Berners-Lee points out in an email, even including a Content-Location in a 200 response (to indicate a description of the requested resource) "leaves the web not working", because such techniques are already used to associate different representations (and different URLs) to the same resource, and not the other way around.

Using any other 200 status code for representations that merely describe a resource (and don't completely represent it) causes ambiguity because Web browsers today interpret all 200 series responses (from a GET request) as containing an complete representation of the resource identified in the request URL.

Every day, people bookmark and send links of documents they are viewing in a Web browser. It is essential that any document viewed in a Web browser has a URL identifier in the browser's address bar. Web browsers today don't look at the Content-Location header to get the document URL (nor should they). For Linked Data to work with today's Web, it must keep requests for resources separate from requests for description documents.

The community has voiced common concerns about the complexity of URI allocation and the use of 303s using today's software. The LOD community jumped in with a few alternatives, however, we must consider how the Web works today and be realistic on further Web client expectations. The established 303 technique works today using today's Web browsers. 303 redirect may be complicated to setup in a document server, but let's give Linked Data servers a chance to mature.

Monday, August 24, 2009

Dereferencable Identifiers

A document URL is a dereferencable document identifier. We use URLs all over the Web to identify HTML pages and other web resources. When you can't give out a brochure you can share a URL. Instead of sending a large email attachment, you might just send a URL instead. Rather then creating long appendixes, you can simply link to other resources. It is so much more useful to pass around URLs then it is trying to transfer entire documents around.

This model has worked well for document and is now being adopted for other type of resources. With the popularity of XML, using URLs to identify data resources is now commonplace. Rather then passing around a complete record, agents pass around an identifier that can be used to lookup the record later. By using a URL as the identifier these agents don't need to be tied to any single dataset and are much more reusable.

From the HTML5 standardization process has risen the debate on the usefulness of URLs as model identifier. Most people agree that a URL is a good way to identify documents, web resources and data resources. However, the debate continues on the usefulness of using a URL as an identifier within a model vocabulary. One side claims that a model vocabulary should be centralized and therefore does not require the flexibility of a URL. The other side claims the model vocabulary should be extensible and requires a universal identifying scheme that URLs provide.

To understand the potential usefulness of using a URL as a model identifier, consider the behaviour difference between a missing DTD and a missing Java class. A DTD is identified using a URL and a Java class is not. When an XML validator encounters a DTD it does not understand it dereferences the identifier and uses the resulting model to process the XML document. When a JVM encounters a Java class it does not understand it throws an exception, often terminating the entire process. Now consider how much easier it would be to program if a programming environment used URLs for classes and model versions. Dependency management would become as simple as managing import statements. As the Web becomes the preferred programming environment of the future, we must consider these basic programming concerns.

Although I enjoy working in abstractions, I certainly understand how things always get more complicated when you go meta: using URLs to describes other URLs. However, this complexity is essential to continue to maintain the flexibility and extensibility of the Web.

See Also: HTML5/RDFa Arguments


Reblog this post [with Zemanta]

Thursday, January 15, 2009

Validating RDF

RDFS/OWL is criticized for its weak ability to validate documents in contrary to XML, which has many mature validation tools.

A common confusion in RDF is the rdfs:range/rdfs:domain properties. A property value can always be assumed to have the type of the rdfs:range value. This is very different to XML, which only has rules to validate tags, but cannot conclude anything. Many of the predicates in RDF are used for similar inferencing, but they lacks any way to validate or check if a statement really is true. This is a critical feature for data interchange, which RDF is otherwise well suited for.

To address this limitation, an RDF graph can be sorted and serialized into RDF/XML. With a little organization of statements, such as grouping by subject, and controlled serialization, common XML validation tools can be applied to a more formal RDF/XML document. Our validation was done with relatively small graphs and we restricted the use of BNodes to specific statements to ensure similarly structured data would produce similar XML documents.

Although TriX could also have been used (it is a more formal XML serialization of RDF), it was considered that the format produced would not be as easy to work with for validation tools.

With a controlled RDF/XML structure we were able to apply RNG to provide structure validation before accepting foreign data and able to automate the export into more controlled formats using XSLT. (We used a rule engine for state validation.) Although RDF is a great way to interchange data against an changing model, XML is still better over the last mile to restrict the vocabulary of the data accepted.

Reblog this post [with Zemanta]