satis egitimisatis


Discussion on the state of cloud computing and open source software that helps build, manage, and deliver everything-as-a-service.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that has been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Login
Posted by on in Product News
  • Font size: Larger Smaller
  • Hits: 56559
  • Print
  • Report this post

CloudStack and Hadoop: a Match Made in the Cloud

Today Citrix announced that CloudStack would become the cloud platform project in Apache Software Foundation. I’m excited not just because CloudStack will be an incredibly vibrant and successful project by itself, I also believe there is a tremendous amount of synergy between CloudStack and other cloud-related projects in Apache Software Foundation. I look forward to continuing to work with, for example, Apache Libcloud and Deltacloud projects.

I am the most excited, however, about the prospect of integrating with Apache Hadoop project. Known primarily as the technology for Big Data applications, Hadoop has gained wide-spread adoption in the industry. Similar to CloudStack which is inspired by Amazon’s EC2 service, Hadoop is modeled after Google’s MapReduce and Google File System technologies. And just like CloudStack, Hadoop is implemented in Java.

At the lowest level, Hadoop Distributed File System (HDFS) is a distributed and scalable file system. HDFS is designed to run on a large number of hosts and achieves reliability by automatically replicating data across multiple hosts. Hadoop project also includes a MapReduce engine and HBase distributed database (modeled after Google’s BigTable.) MapReduce and HBase run on top of HDFS. Highly reliable and highly efficient, Hadoop technology is being used by some of the largest cloud companies including eBay, Yahoo! and Facebook.

Today, CloudStack users already run Hadoop on CloudStack. They implement a service very similar to Amazon’s Elastic MapReduce (EMR). For cloud service providers, Hadoop represents a significant amount of workload that can be readily moved to the cloud. Enterprise deployments can achieve tremendous savings by leveraging the same CloudStack infrastructure to host Big Data workload. Users also leverage CloudStack’s bare metal provisioning capabilities to build high performance Hadoop clusters.

Working closely with Hadoop development community, we have started to explore other ways to integrate CloudStack and Hadoop. Because of its scalability, reliability, performance, and maturity, HDFS is a great object store solution for IaaS cloud. We have started the development of an S3 API front-end for HDFS. Once that work is complete, the combination of CloudStack and Hadoop will provide features equivalent to Amazon EC2 and S3 services.

Of course HDFS will be just one of many technologies CloudStack integrate with to implement S3-compatible object store. CloudStack will continue to work with other scalable storage solutions such as SwiftStack, NexentaStor, Gluster, as well as commercial solutions from NetApp, EMC, Scale Computing, and Caringo. Many of these vendors have incorporated technologies similar to Apache Hadoop in their products. By deploying CloudStack with one of these object store technologies, we can all benefit from the best of both Amazon-style and Google-style clouds!

Rate this blog entry:
Trackback URL for this blog entry.
Sheng is the CEO and founder of, where he drives the vision and overall direction for the company as it transforms the way business can harness the power of their own cloud. Sheng is a recognized expert in virtualization technologies as the lead developer on the original Java Virtual Machine team at Sun Microsystems. Sheng was co-founder and CTO of Teros (acquired by Citrix), a leader in perimeter and network security solutions for enterprises and service providers. Sheng has also held technology leadership roles at SEVEN Networks and Openwave systems where he developed software products for leading service providers and operators around the globe.
  • Sandeep Jain
    Sandeep Jain Tuesday, 15 May 2012

    Status ?

    Sorry to ask this, but there's been no update on this project since initial announcement over a month ago. How can we find out the status of this project (S3 front-end for HDFS) ? Thx.

  • Nguyen Anh Tu
    Nguyen Anh Tu Tuesday, 24 July 2012

    very exciting and I'm looking for it

Leave your comment

Guest Sunday, 28 September 2014


Citrix supports the open source community via developer support and evangeslism. We have a number of developers and evangelists that participate actively in the open source community in Apache Cloudstack, OpenDaylight, Xen Project and XenServer. We also conduct educational activities via the Build A Cloud events held all over the world.