Introducing YippieMove '09. Easy email transfers. Now open for all destinations.

At least in my personal opinion, one of the strongest trends seen at the LinuxWorld expo in San Francisco over the last years has been virtualization. This year many exhibitors had taken the next step and were actually using VMware products on their exhibit computers to simulate a number of servers in a network. For instance, Hyperic demoed their systems management software with a set of virtual servers.

Whenever virtualization comes up, the idea of grid computing isn’t far away as enterprises wish to maximize server utilization by turning their data centers into grids that each deliver the ‘services’ of CPU, memory, ports and so on. But in this brave new world of virtual machines and grid processing there is an element missing. If you’re moving your computing over to a grid computing model, why is there no corresponding grid storage model?

The commercial open source startup Cleversafe has that corresponding model. By employing a mathematical algorithm known as an Information Dispersal Algorithm, found in the cryptographic field of research, Cleversafe separates data into slices that can be distributed to different servers, even across the world. But it’s much more than just slicing and dicing: the algorithm adds redundancy and security as it goes about its task. When the algorithm is done, each individual slice is useless in isolation, and yet not all slices are needed to reconstruct the original data. In other words, your data is safer both in terms of security and in terms of reliability.

Cleversafe is not the first entity to come up with such a scheme. The idea of an Information Dispersal Algorithm is known from Adi Shamir’s paper ‘How to Share a Secret’ and other publications. When we met up with Cleversafe’s Chairman and CTO Chris Gladwin at LinuxWorld, he mentioned that the Information Dispersal Algorithm had been used in many applications before – even to store launch codes for nuclear weapons securely.

The scheme is different from a simple parity scheme in that you can configure how many redundant pieces you want. With parity as found in common RAID setups, you can lose any one storage unit in the set. With an Information Dispersal Algorithm, you can make your system resistant to failure or corruption of any one, two or indeed any number of units in the set. If there’s a strike in your data center in Texas, and your German data center is on fire, your data will still be fully accessible through the remaining servers provided you began with a sufficient number of servers. And as opposed to the brute force solution of multiple mirrors of the data, the dispersal algorithm has a much smaller overhead.

Google is a well known proponent of the brute force solution: the Google File System implementation suggests that the best method to keep your data continuously available is to keep three copies of it at all times. Cleversafe is a smarter system. If you have 16 slice servers (known as pillars in Cleversafe terminology) with a redundancy of 4 slices (known as the threshold) you can lose up to four servers simultaneously and still retain your data. At the same time the total overhead in storage space is only 4/12 – 33% of the space. The advantage as compared to Google’s three copies method is clear: with three copies you only protect yourself against the failure of any two servers and yet you pay a much greater price with a total of 200% storage and bandwidth overhead. And that’s not all. While you’re storing two additional copies of your data, you have effectively tripled the risk of that data being stolen. When a careless system administrator forgets the backup tapes in his car over night and the car gets stolen, all those credit card numbers or what have you will be out in the wild, even that only one out of three locations was compromised. In our Cleversafe example, 12 separate servers would have to simultaneously be compromised – quite unlikely by comparison.

Cleversafe is not alone and there are other actors on the software market such as the PASIS system. PASIS’ home page describes functionality very similar to Cleversafe’s: “PASIS is a survivable storage system. Survivable storage systems can guarantee the confidentiality, integrity, and availability of stored data even when some storage nodes fail or are compromised by an intruder.” None the less, Cleversafe appears to be a step ahead of its competitors at this time and is poised to be the first to deliver grid storage to a wider market.

While the Cleversafe software is developed as open source through the Cleversafe Open Source Community at cleversafe.org, there is a commercial company behind Cleversafe: Cleversafe, Inc. Cleversafe, Inc. plans to generate revenue by offering a storage grid for rent based on the Cleversafe technology. “The market for a more secure, more cost effective storage solution is enormous,” says Jon Zakin, CEO of Cleversafe in a press release issued in May.

The Cleversafe project is available as Open Source under the GPL 2.0 License. The version online is apparently an early alpha version and is not ready for production use. According to Mr. Gladwin, there will most likely be a new version within a month, and sometime in the beginning of the next year Cleversafe may be ready for production use. In the meantime, you can download the current alpha version of the software at the Cleversafe Open Source Website. You can read more about the algorithm at Cleversafe.org’s wiki, and there’s also a flash video describing the idea available.

Author: Tags: , , , ,

Comments are closed.


© 2006-2009 WireLoad, LLC.
Logo photo by William Picard. Theme based on BlueMod © 2005 - 2009 FrederikM.de, based on blueblog_DE by Oliver Wunder.
Sitemap