Sunday, January 15, 2012

Blob Store

In release 2.0-beta14 (I know, this is the late beta release) AliBaba introduced a new BLOB store. The blob store integrates with the RDF repository ObjectRepository to synchronize transactions. This allows both the BLOB store and the RDF store to be isolated and always consistent with one another. This is done using two-phase commit transactions in the BLOB store.

The BLOB store also has a few other advantages over a traditional file system. First every change is isolated until it is closed/committed. This prevents other readers from see an incomplete BLOB and help prevent inconsistency between the BLOB and RDF stores. In additional, as disk space is generally considered cheap, all past versions of BLOBs are keep on disk by default. This allows any previous versions to be retrieved (and restored) using the API.

The BLOB store API is fairly simple. Here what some code might look like using the BLOB store.

BlobStoreFactory factory = BlobStoreFactory.newInstance();
BlobStore store = factory.openBlobStore(new File("."));
String key = "http://example.com/store1/key1";
BlobObject blob = store.open(key);
OutputStream out = blob.openOutputStream();
try {
// write stream to out
} finally {
out.close();
}
InputStream in = blob.openInputStream();
try {
// read stream from in
} finally {
in.close();
}

More API options can be see in the JavaDocs:

Thursday, June 2, 2011

Web Developer Review of BlackBerry PlayBook

Most reviews for the PlayBook focus on the same issue: very few downloadable apps in app world. As a web developer - I couldn't care less.

First Impression

Websites render fast, and due to the high dpi, look really nice. With its compact form, it fits well in my hands, easy to type and very portable. With a flash plugin included, streaming video is smooth and full screen works. Videos look really slick when plugged into a HD TV. Each app can only open one window, so the browser supports tabs and allows you to keep multiple tabs open at once.

Honeymoon Ends

Tabbed browsing works on the desktop, but not on the PlayBook. Only the open tab can be actively loading. Opening a new tab before the page loads can cancel the page from loading. Opening a new tab while watching video pauses the video. This makes watching commercials really frustrating because you can't turn away or it will pause. Watching videos in the browser is also frustrating, as after five minutes the PlayBook goes into suspend. (There are some tricks to stop this, but not if in full screen mode.)

In addition, despite all the fuss about multitasking, the PlayBook can't multitask. Most specifically you can only have one web page active at a time, and this includes webapps.

Surprisingly, the PlayBook is much less web developer friendly than I expected. The script engine is incomplete. There is no offline support for webapps. There is no support for turning a webapp application into a chromeless app. Webworks development requires a series of confusing bat commands that don't work the first time. All of this makes it really hard to develop for the PlayBook.

What's Left

The apps I use include Browser, Wi-Fi Sharing, Word To Go, Slides To Go, Videos, Pictures, aVNC, and ReelPortal. All of them work, but I expected more from almost every one of them.

All that being said, I am going to hold on and put up with the current limitation of the PlayBook. I really like having a portable web browser, and I believe there is still a lot of potential for this device. I am looking forward to seeing what the next software update has to offer.

Monday, February 14, 2011

Five Steps to a More Secure Web App

There are a number of different authentication methods available to choose from when launching (or updating) a Web application. Choosing the wrong method can leave the system (or worse, the users) vulnerable to cyber attacks or identity theft.

Below are five rules that should always be obeyed (regardless of the method). By considering these rules and how your users will use your system, you can better understand the security requirements of your Web application and can choose the right method.

1) Never send clear user passwords over an unencrypted channel.

When passwords are sent over an unencrypted channel, anyone who has access to the network (and a little know how) can read them. This should never be done with user supplied passwords (not even for intranet websites). Users often use the same password for multiple systems. Exposing a user's password in one system puts them at risk in another.

Both basic authentication and form-based authentication are vulnerable to this and should never be used when users can choose their own passwords. Digest authentication and encrypted logins do not send clear passwords, and can be used when users can choose their own passwords.

HTTP basic authentication and HTML form-based logins can be used in secure networks to restrict Web access as long as the passwords are pseudo random, unpredictable, and unique across other systems.

For systems that allow user created passwords, care must be taken to ensure the passwords are not readable by others by using HTTPS or digest during logins.

2) Never send session tokens unencrypted over a shared network.

Unencrypted session tokens are visible to anyone who has access to the network. Although session tokens don't expose the user's password, they do allow hijacking accounts with unlimited access. This should never be used over a public wifi network (or other shared network) to access private information or make changes.

Cookie based authentication over HTTP is vulnerable to this. Digest authentication and HTTPS sessions are not vulnerable.

Digest authentication uses a unique "salt" for every request and digest systems prevent the same "salt" being used more than once (although this is optional). By never using and never allowing the same authentication token twice, digest authentication prevents account hijacking.

HTTPS requests are encrypted and prevent eavesdropping from others on the network, preventing access to any request tokens that might be present.

Only allow HTTPS using keys from a certificate authority, HTTPS with self signed keys, HTTPS with mixed content or digest authentication should be used to exchange private information over shared networks.

For more information about the vulnerabilities of using session tokens see Weaning the Web Off of Session Cookies.

3) Always verify information sent over an insecure network.

Insecure networks may be vulnerable to malicious attacks such as DNS posioning, or a trojan Web proxy. These attacks are often called man-in-the-middle and can manipulate the content from the server before it reaches the client (and vice-versa).

Most unencrypted HTTP communication is vulnerable to this. Even mixed content of both HTTPS and HTTP is vulnerable to man-in-the-middle because compromised HTTP content can read and manipulate HTTPS content.

Although digest authentication includes an optional integrity check to prevent this, most browsers either don't check or don't indicate to the user if the content has been verified.

All Web browsers verify HTTPS content (when not mixed) and this should be used for insecure networks. For mobile devises that often connect from potentially insecure networks HTTPS (self signed or CA signed) should be enabled by default for any private information.

4) Never give confidential information without verifying authenticity of the server.

Well disguised URLs and familiar looking pages can trick users into visiting and pseudo-logging into illegitimate websites. If your website asks your users for confidential information, ensure there is a clear way for your users to verify the authenticity of the site before logging in. Otherwise, your users might give confidential information to untrustworthy third parties without even knowing it.

HTTPS using previously distributed keys (such as keys from an established certificate authority) allow the user to verify the organization in their browser (near the address bar). This allows the user to quickly verify authenticity of the server.

HTTPS with self sign certificates cannot be used to verify authenticity unless they have been previously distributed through a secure channel.

Although digest authentication can include authentication-info to verify authenticity, most browsers either ignore it or don't indicate to the user when the site is verified. However, most browsers do show the host name and realm to the user for review before logging in and this does give the user a chance to check to domain name before logging in.

Always use HTTPS for confidential, or sensitive information.

5) Never access sensitive information over an unencrypted channel.

HTTP traffic can be viewed by any who has access to the network. It is vital that all sensitive information is never sent over unencrypted HTTP. Sensitive information should always use HTTPS.

Only exclusively HTTPS with known certificates should be used to exchange sensitive information with its users.

Always use HTTPS for confidential, or sensitive information.

In summary

By obeying these five rules you can pick the right authentication method and prevent your system and users from being vulnerable to cyber attacks and identify theft.

Sunday, November 28, 2010

Status Code 200 vs 303

The public LOD has been dominated by discussions on using 303 in response to a GET request for distinguishing between the requested resource identifier, and a description document identifier.

Some resources can be represented completely on the Web. For these resources, any of their URLs can be used to identify them. This blog page, for example, can be identified by the URL in a browser's address bar. However, some resources cannot be completely viewed on the Web - they can only be described on the Web.

The W3C recommends responding with a 200 status code for GET requests of a URL that identifies a resource which can be completely represented on the Web (an information resource). They also recommend responding with a 303 for GET requests of a URL that identifies a resource that cannot be completely represented on the Web.

Popular Web servers today don't have much support for resources that can't be represented on the Web. This creates a problem for deploying (non-document) resource servers as it can be very difficult to set-up resources for 303 responses. The public LOD mailing list has been discussing an alternative of using the more common 200 response for any resource.

The problem with always responding to a GET request with a 200 is the risk of using the same URL to identify both a resource and a document describing it. This breaks a fundamental Web constraint that says URIs identify a single resource, and causes URI collisions.

It is impossible to be completely free of all ambiguity when it comes to URI allocation. However, any ambiguity can impose a cost in communication due to the effort required to resolve it. Therefore, within reason, we should strive to avoid it. This is particularly true for Web recommendation standards.

URI collision is perhaps the most common ambiguity in URI allocation. Consider a URL that refers to the movie The Sting and also identifies a description document about the movie. This collision creates confusion about what the URL identifies. If one wanted to talk about the creator of the resource identified by the URL, it would be unclear whether this meant "the creator of the movie" or "the editor of the description." Such ambiguity can be avoided using a 303 for a movie URL to redirect to a 200 of the description URL.

As Tim Berners-Lee points out in an email, even including a Content-Location in a 200 response (to indicate a description of the requested resource) "leaves the web not working", because such techniques are already used to associate different representations (and different URLs) to the same resource, and not the other way around.

Using any other 200 status code for representations that merely describe a resource (and don't completely represent it) causes ambiguity because Web browsers today interpret all 200 series responses (from a GET request) as containing an complete representation of the resource identified in the request URL.

Every day, people bookmark and send links of documents they are viewing in a Web browser. It is essential that any document viewed in a Web browser has a URL identifier in the browser's address bar. Web browsers today don't look at the Content-Location header to get the document URL (nor should they). For Linked Data to work with today's Web, it must keep requests for resources separate from requests for description documents.

The community has voiced common concerns about the complexity of URI allocation and the use of 303s using today's software. The LOD community jumped in with a few alternatives, however, we must consider how the Web works today and be realistic on further Web client expectations. The established 303 technique works today using today's Web browsers. 303 redirect may be complicated to setup in a document server, but let's give Linked Data servers a chance to mature.

Monday, September 13, 2010

HTML-Oriented Development

The heart of all Web applications is the user interface (UI) design - this is what its user interact with. As any consultant knows: clients are more satisfied with a well designed UI and mediocre business logic then they are with a poorly designed UI with minimal transparency and fully automated business rules.

What is surprising (when you think about it) is that most Web application frameworks orient around the business model and treat the HTML like a second class citizen. The conceptual model may be important, but even more important is the representation of the model in HTML. Good UIs provide the user with full transparency to the state and operations of the underlying model. It doesn't matter how well the model is if the HTML is too confusing or too obscure; users will avoid using it.

The HTML of Web applications is surprisingly rich with domain concepts. Most well designed UIs contain all the classes, relationships, and attributes found in the underlying model and present them to the user in a language everyone involved can understand. There is a lot of emerging standards that can help turn this human readable data in HTML into machines readable data using RDFa, microformats, or microdata.

Recently, David Wood and I started the project Callimachus; it has taken a different approach to Web application design/development. Callimachus reads the domain model from your HTML templates! In Callimachus there is no need to maintain multiple models, no SQL schema, no query languages, no object-relation mapping, it's all embedded in HTML using RDFa.

RDFa allows your HTML to include resource identifiers, their relationships, and properties using additional attributes such as: about, rel, and property. Consider the following HTML snippet. Using RDFa the data is readable by both humans and machines alike. It says that James Leigh knows David Wood using the relationship "foaf:knows" and the property "foaf:name".

<div about="james">
<span property="foaf:name">James Leigh</span>
knows
<div rel="foaf:knows" resource="david">
<span property="foaf:name">David Wood</span>
</div>
</div>

Written using a Callimachus HTML template it might look like the snippet below. Here is an embedded query asking who knows "david" and what is their name.

<div about="?who">
<span property="foaf:name" />
knows
<div rel="foaf:knows" resource="david">
<span property="foaf:name" />
</div>
</div>

Callimachus provides the framework necessary to create HTML templates to query, view, edit, and delete resources. This technique allows Web developers to save time and maintenance costs by applying the DRY principle (Don't Repeat Yourself) to Web application development.

For more information about Callimathus see http://callimachusproject.org or turn into my live Webcast on Wednesday at http://www.wilshireconferences.com/semtech2010/email/email-webcast-091510.html

Thursday, May 27, 2010

The Future of RDF

At the end of June, immediately after SemTech, I'll be attending the W3C RDF Next Step Workshop. This workshop has been set up with the goal of gathering feedback from the Semantic Web community to determine if (and how) RDF should evolve in the future. I'll be presenting two papers with David Wood which I hope will generate good discussions...(To review the papers or for more information on the workshop, go to NextStepWorkshop.)

The first paper I'm presenting will show a new RESTful RDF Store API supporting named queries and change isolation. (I blogged about this earlier this year.) This proposed API would combine basic CRUD operations over RDF constructs (graphs, services and queries) and mandate RDF descriptions of services. With the ability to modify an RDF store's state in SPARQL 1.1 comes the challenge of managing store versions and the need to manage them (and their differences) over HTTP.

The other paper is a proposed alternative handling of rdf:List in SPARQL. The way we currently deal with ordered collections in RDF, whether through tools or in SPARQL, is so difficult that it limits adoption of RDF. So much of data retrieval, which is currently dominated on the Web by XML, includes the notion of ordered collections - RDF must align the RDF representation with the conceptual notion of ordered collections if it has a chance of making inroads into already established networks.

Where do you think RDF needs to go in the future? Does it need to change if it is going to stay viable?

Reblog this post [with Zemanta]

Monday, March 8, 2010

Reinventing RDF Lists

Last month the SW interest group discussed alternatives to containers and collections as part of a discussion around what the next generation of RDF might look like. Below is my opinion on the matter.

RDF's simplistic approach makes it possible to encode most data structures, both simple and complex. The challenge people have with RDF, coming from other Web formats, is the lack of basic ordered collections (a concept common in XML). In RDF you are forced into a linked list structure just to preserve resource order. The linked list structure known as rdf:List is difficult to work with and highly ineffective within modern RDF stores.

Most RDF formats provide syntactic sugar to make it easier to write rdf:Lists. In turtle this is done using round brackets (parentheses); in RDF/XML this is done using the parseType collection attribute. However, because rdf:List is not a fundamental concept in RDF, no RDF store implementation preserves them, instead opting to use the fundamental triple form -- a linked list.

RDF is made of the following fundamental concepts: URI, Literal, and Blank Node. A fundamental list concept should be added to make it easier and more efficient to work with ordered collections. This would not have a significant effect on RDF formats, as their syntax would not change, but would have a significant impact on the mindset of RDF implementers.

With this change RDF implementers would strive to ensure that lists are implemented efficiently and provide convenient operations on them, just as they would other fundamental RDF concepts. The triple (linked list) form should be kept for compatibility with RDF systems that don't preserve lists, but the goal would be that RDF systems would not be obligated to provide a triple linked list form that has proven to be ineffective.

By making lists a fundamental RDF concept, there is no required change for RDF libraries to continue to be compatible with existing standards. Most libraries and systems may already understand list short hand and some may also preserve it.

Reblog this post [with Zemanta]