Discussion on the state of cloud computing and open source software that helps build, manage, and deliver everything-as-a-service.
SDN in CloudStack
Software Defined Networking (SDN) has seen a lot of uptick in momentum since VMware acquired Nicira last summer, three months after Google announced that they were using OpenFlow to optimize their internal backbone. In this post we look at "SDN" support in CloudStack.
First, let's try to define SDN in a short paragraph. It is, was it spells out to be (like the french adage: "c'est comme le Port-Salut c'est écrit dessus" :)) a way to configure the network using software. This means that the network definition (routing, switching), optimization (load-balancing, firewall, etc) becomes a software problem with SDN. If you have seen the wikipedia page and read other articles, you most likely have read that SDN decouples the control plan and the data plane. This is short for saying that the forwarding tables used in switches/routers will be controlled by a software/applications that can be remote. Part of the SDN landscape is OpenFlow. OpenFlow is a standard defined by the ONF that allows you to implement a SDN solution. It defines the protocol used by the control plane to send forwarding information (a.k.a flow rules) to network devices. However while some early SDN companies embraced and led the development of openflow (e.g BigSwitch), an SDN solution may not use the OpenFlow protocol. This leads to a key point of todays SDN solutions: SDN != OpenFlow.
The academic in me can't help but point out the GENI initiative, which aims to re-design the internet and start from a clean slate. SDN research happens on GENI but it is also seen as a way to instrument the network, isolate experiments and dynamically reconfigure the network. Something impossible (especially over wide are networks) before SDN. With FutureGrid and Grid5000 being testbeds for IaaS solutions, GENI is a real-life testbed for future networking solutions. I had a chance to work a little bit on GENI while at Clemson. We modified the NOX OpenFlow controller to integrate it with OpenNebula and provide Security Groups as well as Elastic IP functionality.
While virtualization was a key enabler of IaaS, virtual switches are key enablers of SDN based networks. Open Virtual Switch (OVS) is the leading virtual switch. OVS is now used in most private and public clouds, it replaces the standard linux bridge and is used to connect virtual machine network interfaces to the physical network. An OVS can be connected with an OpenFlow controller and receive flow rules from the controller. However it does not need to, an SDN solution could talk directly to OVS. The main issue being that a single openflow controller may not be fast enough to process all "control" decisions on a large networks. We would have to see a distributed OpenFlow controller to be able to reach extremely large scale (It might exist, I just have not found references for it). OVS can do many things among them: VLAN tagging, QoS, Generic Routing Encapsulation(GRE) and Stateless Transport Tunneling (STT) tunnels. To know more about the difference between GRE and STT see this blog by Bruce Davie and Andrew Lambeth.
So where does SDN help IaaS ? Anyone who has worked on networking of virtual machines (VM) knows how complex this can get. VMs from multiple tenants need to be isolated from each other, they have private IP addresses but may need to be accessed form the public internet, VMs can be migrated within a single broadcast domain but one may want to migrate across domains. VMs from multiple data centers may need to be in the same subnet and broadcast domain (Layer2) etc. Networking is really complex issue in IaaS. Even more so that it is hard to understand conceptually. One really needs to think in terms of logical networks and forget about the physical network (at least at a high level). To enable all these things and reach large scale we need to be able to control the network devices from the application layer. This is where a significant shift is happening. The application developers are now going to describe the network they need and provision it on-demand. Yet again SDN is Cloud. On-demand and Elasticity in the network thanks to SDN.
So where does Apache CloudStack stands with SDN ? One of the main design decisions in CloudStack was to provide multi-tenancy and isolate guest networks. Pre-SDN, the way to achieve this was to use a VLAN per guest network. Creating an isolated Layer-2 broadcast domain for each tenant (and even multiple VLAN per-tenants if need be). Advanced networking in CloudStack was all about VLANs. VLAN ids however are 12 bit, that means that the grand maximum of VLANs is 4096. While it can seem big, you could very well run out of VLANs quickly. Here comes SDN. SDN allows you to build a new type of isolation for your tenants. The main tenet (pun intended) is to build a mesh of tunnels between all the virtual switches residing on all the hosts/hypervisors in your data center. OVS can do that. Creating those meshes you can create network overlays to build Layer 2 broadcast domains within zones, across zones and over WAN while ensuring isolation of tenants.
The previous snapshot on the left shows the CloudStack GUI when you are creating an advanced zone. You need to specify the type of isolation. Traditional isolation would be using VLANs, but you see two other types of isolations: GRE and STT. These are protocols used to create tunnels between OVS bridges (logical switches). The GRE isolation type will be used with what I call the "native SDN solution" in CloudStack. It is a SDN controller built-in the CloudStack code that creates GRE tunnels using OVS (Currently only supported with Xen, but KVM support should be in 4.1 if not 4.2 this summer). The wiki has an extensive functional specification titled OVS tunnel manager. The slides below are also a great presentation of this solution:
Choosing the STT isolation type will you guessed it use STT tunnels between all the virtual switches. This is currently only being used by the Nicira NVP plugin described in our documentation. Hugo Trippaers (@Spark404) from Schuberg Phillis is the author of the plugin, he recently presented about the integration at a Build a Cloud Day workshop and talked about the upcoming features in the CloudStack 4.1 release (KVM support and Layer 3 routing). See his slides below:
We are seeing two more SDN "solutions" being integrated in CloudStack. First is Big Virtual Switch from BigSwitch. Development is happening right now, and the commits made the 4.1 code freeze. So expect to see it in the 4.1 release at the end of March. Expect to see open source OpenFlow controllers being used with CloudStack this summer. The last one is Midonet from Midokura. While documentation has been posted on slideshare (see below and skip the first page if you don't read japanese), the commits have not yet been made. So look at 4.2 release for Midonet support in CloudStack.
This is only the beginning. While these solutions are used to provide multi-tenant isolation, we can bet that SDN will be used to provide load-balancing, elastic IPs, security groups, migration support, dynamic leasing and optimization of network. SDN brings network intelligence to your IaaS.
Send me comments on twitter @sebgoa