This post is a little more formal than usual as I wrote this for a tutorial on how to run hadoop in the clouds, but I thought this was very useful so I am posting it here for everyone's benefit (hopefully).
When CloudStack graduated from the Apache Incubator in March 2013 it joined Hadoop as a Top-Level Project (TLP) within the Apache Software Foundation (ASF). This made the ASF the only Open Source Foundation which contains a cloud platform and a big data solution. Moreover a closer look at the projects making the entire ASF shows that approximately 30% of the Apache Incubator and 10% of the TLPs is "Big Data" related. Projects such as Hbase, Hive, Pig and Mahout are sub-projects of the Hadoop TLP. Ambari, Kafka, Falcon and Mesos are part of the incubator and all based on Hadoop.
To Complement CloudStack, API wrappers such as Libcloud, deltacloud and jclouds are also part of the ASF. To connect CloudStack and Hadoop two interesting projects are also in the ASF: Apache Whirr a TLP, and Provisionr currently in incubation. Both Whirr and Provisionr aimed at providing an abstraction layer to define big data infrastructure based on Hadoop and instantiate those infrastructure on Clouds, including Apache CloudStack based clouds. This co-existence of CloudStack and the entire Hadoop ecosystem under the same Open Source Foundation means that the same governance, processes and development principles apply to both project bringing great synergy that promises an even better complementarity.
In this tutorial we introduce Apache Whirr, an application that can be used to define, provision and configure big data solutions on CloudStack based clouds. Whirr automatically starts instances in the cloud and boostrapps hadoop on them. It can also add packages such as Hive, Hbase and Yarn for map-reduce jobs.
Whirr  is a "set of libraries for running cloud services" and specifically big data services. Whirr is based on jclouds . Jclouds is a java based abstraction layer that provides a common interface to a large set of Cloud Services and providers such as Amazon EC2, Rackspace servers and CloudStack. As such all Cloud providers supported in Jclouds are supported in Whirr. The core contributors of Whirr include four developers from Cloudera the well-known Hadoop distribution. Whirr can also be used as a command line tool, making it straightforward for users to define and provision Hadoop clusters in the Cloud....