Monday, November 2, 2009

Why isn't the Web Object-Oriented?

A big part of the Web is web services, but often these services are not modelled using an object oriented paradigm, even though it is well suited for complex behaviours. Web services are often modelled using a simple request/response paradigm or a service oriented paradigm using a RESTful framework, but many of these resource oriented frameworks can be adapted to support some object oriented concepts.

Many people think of classes and methods when they think of Object-Oriented Programming (OOP). However, I like to think of OOP as message passing with class specialization. This is particularly helpful when designing Web services, which also use a message passing model. Even RESTful Web services use forms of message passing between nodes.

Consider the simple URL below. When followed a GET request is sent to a Google server. This can be thought of as sending Google's search object a message with the given search term parameter (using the Google network as the authority). The search object (in this case a proxy) responds with an HTML page with the search results.

   Object Authority
_________|_________
/ \
http://www.google.com/search?q=Why+isn%27t+the+Web+Object-Oriented%3F
\__________________________/ \_____________________________________/
| |
Object Identity message


All HTTP requests can be thought of as messages being sent to remote objects. The request method, query parameters, headers, and body make up the message, and the request URI identifies the message's target object. The HTTP response is the message's return value.

However, OOP is more than simply message passing. A big part of OOP is the association of behaviour with data. The relationship between behaviour and data drives at the difference between service oriented and object oriented paradigms. A service oriented model is like an object oriented model, but all objects are stateless singletons with their own unique behaviour. Because of this, pure service oriented systems can be more efficient (less data access), but is more expensive to maintain, as each service must consider all possible variations at once. In contrast, OOP supports behaviour specialization and can more closely reflect the structure of systems 'in the real world'.

While many services are identified by a single request URI (scheme+authority+path), most RESTful frameworks allow data to also be associated with the URI. JAX-RS, for example, allows path parameters that are often populated with a unique entity ID. By incorporating the entity ID in the URI, data is associated with the behaviour in the same way as in an OOP paradigm. However, most RESTful frameworks fail to provide any support for object or resource behaviour specialization -- a feature that is incredibly powerful in class-based OOP.

The Web is actually fairly close to seamlessly supporting an object-oriented paradigm. Processing efficiency seems to be the only barrier. However, with the growing costs of maintaining complex Web systems, I'm not sure how long this argument can hold up. When do you think we'll have an object oriented Web framework and what would it look it?

Reblog this post [with Zemanta]

Monday, October 26, 2009

The Complicated Software Stack

To aspiring Web application developers or people looking to put together their own Web application: the road to building a modern working Web application is a long and complicated journey.

Today's Web application developer is nothing short of a jack-of-all-trates, requiring deep knowledge of everything from HTML and CSS to Java and SQL. Everything from common CRUD tasks to sophisticated work-flows requires knowledge of half a dozen computer languages along with their quirks and variations across platforms and applications.

Today's software is built using a mix of programming paradigms and data models. Every level in the software stack requires explicit data mapping between paradigms. Many Web applications include the following levels in their software stack:
• Relational for persistence,
• Object oriented (class-based) in the model,
• Aspects peppered throughout,
• Resource (or activity) oriented Web services,
• Functional template engines,
• Markup using key/value pairs, and
• Prototype based objects for UI behaviour.

The above complication comes at a price. Software takes longer to develop and is more expensive to maintain than it used to be. This is causing a greater divide between small tools and large software systems.

Applications, like Microsoft Excel, which combine data processing and persistence using a consistent programming paradigm, have grown in popularity as a cheap alternatives to the complexity of modern Web applications.

While the market for Web applications has grown, the scope has decreased, favouring large high volume systems. Smaller Web applications are too often over-architected and over-budget. There is a large (and growing) opportunity for software vendors to fill this divide and create a new platform that combines data processing and persistence, using a single programming paradigm, for Web applications.

Can Web applications be built to use a single programming paradigm?

Reblog this post [with Zemanta]

Tuesday, September 29, 2009

Chrome Frame: Love It Or Hate It

Google ChromeImage via Wikipedia

Google has clearly struck a nerve among browser makers with the announcement of Chrome Frame. Microsoft was awfully quick to down play any thoughts about installing Chrome as a plugin for IE considering it refers to the WebKit's market share as a "rounding error". Mozilla has also recently become vocal about putting down any notion of a browser-in-a-browser solution. This is all quite bizarre as both of these players are big into browser plugins of some form or another. Microsoft with its alternative Silverlight application engine and Mozilla, which acquired its market share through extensible plugins of its own.

It is actually quite common to have multiple rendering engines within the same browser: flash, silverlight, and Java being the most obvious, but there is more. IE has had a number of browser plugins in the past, including Mozilla ActiveX Control and Google's SVG plugin. IE8 ships with multiple rendering engines that get triggered based on HTML tags or user actions. Nescape 7, although short lived, shipped with both the Gecko and IE rendering engines. Mozilla has previously encouraged this type of action in the past, with Google's ExCanvas and Mozilla's, now inactive, Screaming Monkey initiative. Today Mozilla still makes IE available as a Firefox plugin.

I think it is ridiculous to ask users to only use particular browsers for particular websites. Choosing the best available rendering engine should be the choice of the website authors and I would welcome a mega-browser that seamlessly switches between Gecko, Trident, WebKit, and Presto based on the preferred engine of the author. More precisely, I trust website authors will choose standard compliant engines more then I trust users to choose standard compliant browsers.

I find Mozilla's reaction particularly interesting as it comes at a time when I find myself, an old Gecko fan, looking at WebKit more seriously. Recently in a project, due to an old outstanding Gecko issue, I had to put Firefox support on hold while Trident, Presto and WebKit continued to operate without much trouble.

I know it is true with IE, but perhaps it is true with Mozilla as well, that they view the engine as just something a browser needs and not a feature in and of itself. Perhaps I have been wrong all along and XUL is actually Mozilla's doom.

Reblog this post [with Zemanta]

Sunday, September 20, 2009

Accept Headers: In The Wild

As web agents (including browsers) become more diverse there is an increasing need to distinguish between their types. The User-Agent header can be used for this task, but requires the server to know in advance all the possible agents and what type they are. This is not possible as both the diversity and quantity of agents is growing too quickly for any single registry to track.

According to the HTTP specification, the Accept header can be used to determine the type of agent. For example:
• HTML browsers should include "text/html" within the Accept header,
• XHTML browsers include "application/html+xml",
• RDF browsers include "application/rdf+xml",
• XSLT agents include "application/xml",
• PDF agents include "application/pdf",
• Office suites include "application/x-ms-application" or "application/vnd.oasis.opendocument", and
• JavaScript libraries include "application/json"

This allows the server to better redirect the agent to an appropriate resource.

Obviously, if a service will only serve HTML browsers, the type of agent is not necessary, as is the case in the Web 1.0 days when everything on the Web was HTML. However, as HTTP is becoming a more popular protocol for non-HTML communication, the need for distinguishing between types of agents is becoming important.

Consider the situation when an abstract information resource (like an order or an account) is identified by a URL. When the server receives a request for an abstract information resource, it needs to know which type of agent is requesting it, so it can better redirect the agent to an appropriate representation. If the agent is an HTML browser, the server should redirect to an html page displaying the order or account information; if a JavaScript library, the server should redirect to a json dump of the order/account summary; if a PDF agent, the server should redirect to a order/account summary report; if an office suite, the server should redirect to a spreadsheet of the details.

This works very well in theory, but because the Web was built with only HTML browsers in mind, most browsers don't properly implement the HTTP specification (because they don't have to). Even worse is that most non-HTML browser agents either don't include an Accept Header at all or use */* and say nothing about the type of agent. Below are some of the default accept headers from popular user agents on the web.

FF3.5 is an HTML and XHTML browser first, XML/XSLT agent second
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

IE8 is a media viewer (apparently)
image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, */*

IE8+office is a media viewer and office suite
image/gif, image/jpeg, image/pjpeg, application/x-ms-application,
application/vnd.ms-xpsdocument, application/xaml+xml,
application/x-ms-xbap, application/x-shockwave-flash,
application/x-silverlight-2-b2, application/x-silverlight,
application/vnd.ms-excel, application/vnd.ms-powerpoint,
application/msword, */*

Chrome3 is an XHTML and XML/XSLT agent first, HTML browser second, and text viewer third.
application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Safari3 is an XHTML and XML/XSLT agent first, HTML browser second, and text viewer third.
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Opera10 is an HTML and XHTML browser first, XML/XSLT agent second.
text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1

The MSN bot is an HTML browser, text viewer, xml client and application archiver.
text/html, text/plain, text/xml, application/*, Model/vnd.dwf, drawing/x-dwf

Google search bot is a jack of all agents, master of none
*/*

Yahoo search bot is a jack of all agents, master of none
*/*

AppleSyndication is a jack of all agents, master of none
*/*

See Also:
Unacceptable Browser HTTP Accept Headers (Yes, You Safari and Internet Explorer)
WebKit Team Admits Error, Downplays Importance, Re: 'Unacceptable Browser HTTP Accept Headers'

Reblog this post [with Zemanta]

Monday, August 24, 2009

Dereferencable Identifiers

A document URL is a dereferencable document identifier. We use URLs all over the Web to identify HTML pages and other web resources. When you can't give out a brochure you can share a URL. Instead of sending a large email attachment, you might just send a URL instead. Rather then creating long appendixes, you can simply link to other resources. It is so much more useful to pass around URLs then it is trying to transfer entire documents around.

This model has worked well for document and is now being adopted for other type of resources. With the popularity of XML, using URLs to identify data resources is now commonplace. Rather then passing around a complete record, agents pass around an identifier that can be used to lookup the record later. By using a URL as the identifier these agents don't need to be tied to any single dataset and are much more reusable.

From the HTML5 standardization process has risen the debate on the usefulness of URLs as model identifier. Most people agree that a URL is a good way to identify documents, web resources and data resources. However, the debate continues on the usefulness of using a URL as an identifier within a model vocabulary. One side claims that a model vocabulary should be centralized and therefore does not require the flexibility of a URL. The other side claims the model vocabulary should be extensible and requires a universal identifying scheme that URLs provide.

To understand the potential usefulness of using a URL as a model identifier, consider the behaviour difference between a missing DTD and a missing Java class. A DTD is identified using a URL and a Java class is not. When an XML validator encounters a DTD it does not understand it dereferences the identifier and uses the resulting model to process the XML document. When a JVM encounters a Java class it does not understand it throws an exception, often terminating the entire process. Now consider how much easier it would be to program if a programming environment used URLs for classes and model versions. Dependency management would become as simple as managing import statements. As the Web becomes the preferred programming environment of the future, we must consider these basic programming concerns.

Although I enjoy working in abstractions, I certainly understand how things always get more complicated when you go meta: using URLs to describes other URLs. However, this complexity is essential to continue to maintain the flexibility and extensibility of the Web.

See Also: HTML5/RDFa Arguments


Reblog this post [with Zemanta]

Sunday, August 23, 2009

97 Things Every Project Manager Should Know

If the projects you manage don't go as smoothly as you'd like, 97 Things Every Project Manager Should Know offers knowledge that's priceless, gained through years of trial and error. This illuminating book contains 97 short and extremely practical tips -- whether you're dealing with software or non-IT projects -- from some of the world's most experienced project managers and software developers. You'll learn how they've dealt with everything from managing teams to handling project stakeholders to runaway meetings and more.

This is O'Reilly's second book in its 97 Things series. My contributions included tips to Provide Regular Time to Focus and Work in Cycles.



Reblog this post [with Zemanta]

Friday, July 31, 2009

SPARQL Federation and Quints

There are currently a couple popular way to federate sparql endpoints together:

1) In Jena the service must be explicitly part of the query, and therefor the model,

2) In Sesame the basic query patterns must be associated with one or more endpoints before evaluating the query, or

3) Hack the remote query into a graph URI: http://gearon.blogspot.com/2009/05/federated-queries-long-time-ago-tks.html

Although both can be used to achieve the same results, Jena's solution puts more responsibility in the data model, and Sesame's put more responsibility in the deployment. Both have their trade offs, but I believe the query is suppose to be abstracted away from underlying services. The domain model (and therefore the queries) should not be aware of how the data is distributed (or stored) across a network. Therefore, I prefer to describe which graph patterns and relationships are available at each endpoint during deployment and make the application model independent of available service endpoints.

Furthermore, I think it is a bit silly to add yet another level of complexity to the basic query pattern. Adding the service level turns the basic query pattern from a quad to a quint.

To fully index a quint (with support for a service variable, which Jena does not support) would take 13 indexes (nearly double what a quad requires). Below is a table of some complexity levels and how many indexes they require to be fully indexed (variables could appear in any position within the pattern). I have included a theoretical sext that would allow you to group services in a network (just as graphs can be grouped in a service).
Level#ofIdxTermData Structure
double2subjectdirected graph
triple3predicatelabelled directed graph
quad7graphmultiple labelled directed graphs
quint13servicereplicated multiple labelled directed graphs
sext25networktrusted replicated multiple labelled directed graphs
Switching from triples to quad provides a big functionality leap (the ability to refer to an entire graph as a single resource). However, I question how much functionality a quint (or a sext) has over a quad. Couldn't the same functionality be put into a property of the graph (or embedded in the graph's URI authority). An inferencing engine/query could also conclude graph relationships like (subGraphOf), which would still allow a large, but precise, collection of graphs to be queried more effectively.

Hopefully, this topic will have more time to mature before the SPARQL working group makes any official decisions on the matter.