Saturday, December 17, 2005

Wherefore art thou Operations?

Blogsphere has been alive (and dead) with the recent failure of service of SixApart's Typepad blog service. Many users are very fustrated. Some to the point of moving to another service such as Jeff Nolan. Om Malik is riffing his refrain about the scalability of web 2.0 companies. David at 37Signals has a different take on the issue. They have even debated the subject on a podcast.

But their debate is a red herring. Or put another way, they are wrong.

Web 2.0 companies are service providers that rely on engineering operations to provide their service. Instead of their service being say, satellite broadcasting, it is publishing blogs (in the case of Six Apart). Consequently, web 2.0 and web service companies face the same issues as any other company that relies on engineering operations. They are not special or unique or some how able to defy the laws of physics.

The problems that web 2.0 companies (this applies to any web service company i.e eBay, Google, Yahoo! etc.) have suffered recently are not an indication an inherent problem in the company concept or business plan. What the problems do indicate is a startling lack of engineering operations expertise. Where is the maintence scheduling, the backout plans, the risk analyses? Where are the very basics of engineering operations?

Engineering operations has developed methodologies, tools and knowledge base over the last 50 years that ensures smooth provision of service and dramatically reduce the risk due to unforeseen events. You know, those things that lead to fustrated clients and lost revenue. And bankruptcy. The methods and tools are used because they work.

Nothing that the web 2.0 companies are doing indicate that they are using the methods and tools of engineering operations and I have to ask why? It is not hard to use the simple methods and tools. They don't need to use the more complex methods or tools so again I ask why? Lack of knowledge about these methods and tools, delusion that some how web 2.0 companies are special or disbain for tools and methods from outside the web world? I expect it is a combination of these and others.

The solution is not hard to implement. Hire someone with experience in engineering operations. Someone who can use the methods and tools to address the risks and processes of the company's operations. This person will have to do a lot. The operations will require a lot of house keeping as the current processes are brough up to scratch. Failing that, go down to the local bookstore and purchase three reference books. One for engineering operations, one for engineering risk analysis and one for engineering quality assurance and use them. While not as good as hiring someone with experience in engineering operations at least it will be better than the status quo.

The issue isn't scalability or lack thereof or even reliability or lack thereof. It is a lack engineering operations expertise. The web service companies are finding that even they cannot escape Murphies Law,

"What can go wrong, will go wrong in the worst possible way at the worst possible time."
Not using engineering operations methods and tools means that a small failure quickly spirals into a catastrophe for web service companies. They have nothing to manage nor mitigate the risks. Until web 2.0 companies, in fact any web service company, effectively addresses the engineering operations side of their businesses they are not ready for being mission critical systems.

Tags: , , , ,

2 comments:

Zoli Erdos said...

For some of these Web 2.0 companies the magic word may very well be: Outsourcing. Focus on what they are good at, which is not data center operations...

Simon Cast said...

Definitely. I'm actually surprised that many of the smaller companies don't already seem to do this. If they do then that is a sad indictment of the data center operators.

The web service companies still need to internalise engineering operations, in particular engineering risk management. The best way is to hire someone into a position of authority and responsibility for this.

Even with outsourcing operations someone in the company needs to understand operations to liase with the outsourcers and act as a sanity check.