Tuesday, 10 August 2010

Cassandra from Facebook- not just a pretty name!

Everyone has heard of MySQL and others of its ilk – chances are you talk about nothing else at family get-togethers. Like most database solutions, it is optimised predominantly for reading, the idea being you write your data infrequently but read often (the classic content managed website model). While this model can obviously still be used in the development of the plethora of multi-write social websites, giving as it does a nice balance of performance and ease of use, it isn’t necessarily the ideal solution.

One emerging alternative is an open source distributed database management system called Cassandra. Originally developed and open sourced by Facebook in 2008, Cassandra is a second generation NoSQL top-level Apache Software Foundation project. NoSQL, originally a relational database management system accessed via a Unix shell, is becoming an umbrella term for data storage solutions that don’t follow the traditional relational model. This broad term does mean that the various solutions don’t exactly match up, so what does Cassandra provide?

It is a cluster database providing a structured key-value store with eventual consistency, without any single points of failure (it has no central master). This means that it’s extremely fault tolerant – if any one of the identical nodes fails, another will take its place. As well as this high availability, it also provides write and read scaling – as more nodes are added, read and write throughput increase linearly without interruption.

This all sounds very good, but as with all web technologies, it is best employed carefully – these benefits come at the cost of many of the tools often associated with a traditional database. Nevertheless, when used in the right situation, it can offer a reliable and proven solution as demonstrated by Facebook, Digg, Twitter and others. A discussion of when to use Cassandra is available here.

Outside the project page (linked above), more information can be found here, with a discussion of various NoSQL solutions (including Cassandra) available here.

Nick Nawrattel, Lead Multimedia Developer (wearing his programming cap)

No comments: