Monday, June 22, 2009

Hadoop redundancy mgmt for Web Services

Today I was stumpedby a question, why is there a such focus on developing no FAIL web services, when by definition any highly distributed computing model is inherently unreliable.

Examples of talks on reliability in the Ruby camp How Not To FAIL At Web Services in the Microsoft camp, when services do FAIL how to debug

The issue of reliability seems likely to increase, StikeIron now has 50 webservices StrikeIron Services List We know we can expect many more services to emerge. How will we handle in a scalable simple way like Hadoop?



Consider a conceptual analogy Hadoop provides redundancy for data storage by providing software to manage stripping of data across multiple commodity data servers, known as sharding. The expensive forerunner was a single beefy server equiped with a hardware based raid array.

Now consider the options for redundant Web Services in Microsoft architected solution. The recommended recourse is to use a single instance of SQL Server with the SQL Server Service Broker to manage queues and re-queuing for failed webservices.

But the characteristics of a single instance of SQL server match those of hardware raid. Unplug the power supply and the system FAILS. Whereas Hadoop is designed for server failures. If a PSU a disk drive or complete server goes out, Hadoops software reassigns one of the 2 sharded data stripes as the new primary stripe and automatically handles data replication to at least 3 servers.

What would a software based solution for managing web services failures look like? Lets assume redundancy features similar to Hadoop. 3 message web services queues are maintained. If queue 1 blows, switch to queue 2, and replicate the lost queue. The solution can alos learn the pattern of availability through statistical trail and error, and react by calleing web services at reliable times of day. The management of web services can simply retry, skip, queue, re-queue or email the administrator like Hadoop does.

Microsoft is no sleeper and the .NET Services Bus, part of Azure includes message handing and for sure over time mechanisms for handing failed Webservices via Azure will emerge. But what about business architects who wish to implement independent redundancy of web services now.

Do you know of an Hadoop like mgmt infrastructure project or product for managing queues to external Web Services?

Some brain storm ideas
The .NET Service Bus provides Frictionless connectivity across applications via Azure. “Web services are redundant as we know them - Juval Lowy”

Friendfeed. Adding indexes to a database with more than 10 – 20 million rows completely locks the database for hours at a time. After some deliberation, FriendFeed decided to implement a "schema-less" storage system on top of MySQL rather than use a completely new storage system. how-friendfeed-uses-mysql-to-store-schema-less-data Deploys Memchached