Traditional Protocols Matter with SDN

First of all, I have been saying this for over a year: We have been doing SDN for a long time. We, as an industry, have been programming changes into our networks, with our own vision of what needs to happen, when it needs to happen, and doing things that protocols do not normally offer.

Software Defined Networking is useful when your protocols don’t cut it.

Let’s face it. Almost every protocol (OSPF, IS-IS, RIP, MP-BGP, etc) does one thing: it advertises vectors, or reachability of a destination. Some protocols do it better than others, some allow for more data to come along with those vectors such as tags, communities, targets, etc. Then you strap some logic on your routing nodes to interpret the received routing data, and you make decisions. Remember Policy Based Routing (PBR)? How is this any different than SDN? I would venture to say SDN offers a possibility to achieve many of the ideas discussed 8-10 years ago during PBR discussions.

The Method Matters

I have spent time immersed in SDN implementations, presentations, vendor meetings, and even knee-deep in the code. When you want to make changes to your network there are some basic questions that need to be asked:

  • Does this network change need to occur dynamically?
  • How can we quantify dynamic; is this every day, every week, or changes to occur in a totally flexible non-deterministic manner. (ie. people clicking things when they want will make changes to the network)
  • Do we have the tools to prevent bad stuff from happening. (Imagine that you allowed for the dynamic creation of MPLS LSPs across your network. What would happen if you did not implement safe upper limits on RSVP-TE objects or no limits on number of paths?)
  • How complex will this “simple program” be?

Protocols are Software

Protocols have state machines; they learn and distribute information and they have well defined routines with clearly articulated and implemented behavior. This can be said for the majority of the well used routing protocols that are defined by the IETF. Furthermore, people have learned how to work with these protocols, and for the protocols that allow additional extensions to their functionality, there is almost anything you can do with the flexible framework. Think: IS-IS TLVs or new BGP SAFI/NLRI/GENAPP.

There is already so much that the networking software industry has made available to developers and operators. As we want to¬†programmatically make configuration changes and drive ‘show commands’ on nodes we could use Expect libraries in any number of modern programming languages (PERL, Python, Ruby, etc), or drive alternative data models with NETCONF, RESTAPIs, YANG, OpenFlow Protocol, or SNMP.

So with all this choice comes the paralysis of complexity. Let’s take CLOUD for example, and think about the need for some flexibility in network topology and automation. In a very simple scenario, each tenant in a cloud environment will need some basic network isolation and routing. In enterprise clouds, this looks like a VLAN with some DHCP assignment. The cloud management system would work well if the association of a Virtual Machine (VM) to a network was flexible; and thus it would be useful to allow for the automation in the creation of the VLAN.

In OpenStack this could be achieved by nova-network which if you are using Linux Bridging, will gladly create a new VLAN, IP address, and bridge instance in the kernel. The challenge about automation, is that there usually is more to the picture; the on-ramp and off-ramp of traffic beyond the hypervisor. This could be the VRF interface binding, or VLAN creation on the ToR, or security policies on ToR or other Layer 2/3 devices that will process frames/packets for this newly instantiated network.

Sure, Quantum would be a useful element to create things dynamically. We could even find a way to have Quantum drive RESTAPIs over to an OpenFlow controller, or have Quantum directly make changes to upstream devices or ToRs. Then things get very complex because we have decided that there are different ways to propagate change events into our network.

When it comes to networking, there is a science. There is an agreed upon method on distribution of information, and there are agreed upon data formats. There have been plenty of protocols that were vetted in the IETF only to have vendors have different takes upon implementation, and then vendors either duke it out in the IETF or the consumer market demands that the vendors fix their code and make things interoperate. Interoperating is good for the consumer, and in the long run, it is probably even good for the vendor if they wish to gain customers through good will and upgrade strategies.

Today, as a technologies in the field of networking, I have a lot of ways to build cloud networking. I could fully embrace the overlay networking camp and do something like this:

  • Build a highly scalable Layer 3 Data Center Fabric
  • Install some cutting-edge bridging and tunneling software on each hypervisor (OVS or similar)
  • Pre-build all my public-facing network constructs

Now, all of this web of software might just work; but imagine when it doesn’t. In a scenario of network failure, we need to start from some layer. Which layer first? Is the issue related to transport between hypervisor? If so would this issue relate to a true transport problem, or an issue with tunnel encapsulation or transmission on the hypervisor? Are VLANs necessary on the ToR? Is the ToR routing correctly between other ToRs?

The argument that I believe needs to be brought the table, is that complexity of network control in software needs to be very similar to existing behavior, or it needs to have extremely open details about it’s intended behavior. This is what protocols bring to the table: they are not black boxes; you can read up on how they operate and pretty quickly you will understand why something is working or not.

BGP is the Truth

I had this discussion the other day with one of BGP’s creators; I told him that “I can get anyone off the street and ask them to show RIB-OUT and RIB-IN between two systems and get the truth; for me BGP is the truth”. Yes I trust some well defined XML-RPC exchanges or REST interfaces; but what I really like about BGP and other well defined protocols is that they have clear definitions and behavior. Of course there is enough rope provided to hang yourself when you get into complex VPN topologies with leaking Route Targets between a series of tables. But this is the fun of having a tool like this in your tool belt. You are free to have seriously complex or easy to understand network constructs.

Back to a previous thought as I started above; in CLOUD networking, we usually need to create a VLAN per tenant; but if this was too complex to automate outside of the hypervisor, we might just pre-build all the VLANs on the ToRs and L2 infrastructure. However, the downside to this approach is that there could be inefficient layer 2 domains. A much better approach would involve having MVRP running between the Virtual Switch on the hypervisor and the ToR. With MVRP we could have OpenStack’s Quantum or OVS simply advertise the presence of the VLAN to the switch and the layer 2 domain would be dynamically programmed on the directly connected ToR.

All in all, my thoughts are that it’s exciting to create new models for connectivity between hosts and nodes in networks; but we need to ensure that enough simplicity or “nodes of truth” exist. There should be¬†authoritative sources of truth on reachability, and for the last 30 years this has been visible in two places: the RIB and the FIB. When we move into new domains that have their own concepts of state, control , and abstraction, we need to have a way to monitor this new behavior. It’s not entirely a new concept; today even in the networking world we rarely see a true mix of optical transport layers (DWDM, SDH/SONET, etc) and Layer 2/3, but it will remain important to remember there is still a lot to be learned about scale from the Internet. Tools like MP-BGP, Route Reflectors, Route Targets, NETCONF, etc are just are useful today in a world of OpenFlow, Quantum Plugins, and new Linux Kernel Modules. The future is now, and the past is important.