satis egitimisatis


Discussion on the state of cloud computing and open source software that helps build, manage, and deliver everything-as-a-service.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that has been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Login
Subscribe to this list via RSS Blog posts tagged in Apache Hadoop MapReduce BigTable CloudStack Cloud computing

Today Citrix announced that CloudStack would become the cloud platform project in Apache Software Foundation. I’m excited not just because CloudStack will be an incredibly vibrant and successful project by itself, I also believe there is a tremendous amount of synergy between CloudStack and other cloud-related projects in Apache Software Foundation. I look forward to continuing to work with, for example, Apache Libcloud and Deltacloud projects.

I am the most excited, however, about the prospect of integrating with Apache Hadoop project. Known primarily as the technology for Big Data applications, Hadoop has gained wide-spread adoption in the industry. Similar to CloudStack which is inspired by Amazon’s EC2 service, Hadoop is modeled after Google’s MapReduce and Google File System technologies. And just like CloudStack, Hadoop is implemented in Java.

At the lowest level, Hadoop Distributed File System (HDFS) is a distributed and scalable file system. HDFS is designed to run on a large number of hosts and achieves reliability by automatically replicating data across multiple hosts. Hadoop project also includes a MapReduce engine and HBase distributed database (modeled after Google’s BigTable.) MapReduce and HBase run on top of HDFS. Highly reliable and highly efficient, Hadoop technology is being used by some of the largest cloud companies including eBay, Yahoo! and Facebook.

Today, CloudStack users already run Hadoop on CloudStack. They implement a service very similar to Amazon’s Elastic MapReduce (EMR). For cloud service providers, Hadoop represents a significant amount of workload that can be readily moved to the cloud. Enterprise deployments can achieve tremendous savings by leveraging the same CloudStack infrastructure to host Big Data workload. Users also leverage CloudStack’s bare metal provisioning capabilities to build high performance Hadoop clusters.

Working closely with Hadoop development community, we have started to explore other ways to integrate CloudStack and Hadoop. Because of its scalability, reliability, performance, and maturity, HDFS is a great object store solution for IaaS cloud. We have started the development of an S3 API front-end for HDFS. Once that work is complete, the combination of CloudStack and Hadoop will provide features equivalent to Amazon EC2 and S3 services.

Hits: 56638
Rate this blog entry:
Continue reading Comments


Citrix supports the open source community via developer support and evangeslism. We have a number of developers and evangelists that participate actively in the open source community in Apache Cloudstack, OpenDaylight, Xen Project and XenServer. We also conduct educational activities via the Build A Cloud events held all over the world.