Ships in the Night Cause Me Fright

Ships in the Night
There is a common trend in networking and virtualization today: Make things automated. This is a positive step. For the longest, we have rewarded complexity. Take for example, the focus of most professional networking certifications on corner case scenarios with arbitrary routing protocols being redistributed into other protocols with heaps of complex policy.

Now there seems to be a movement to “hop over” the complexities that exist in the network by building overlays. It’s not entirely a new concept, as MPLS over GRE, IPSEC mesh VPNS, or even PPP/L2TP have long been our precedents for constructing connectivity over existing transports.

It’s fantastic that we are building with layers and leveraging existing frameworks and taking the new moving part up to another layer. Take for example the trend in hypervisor networking for overlays: the Data Center network remains simple, maybe a limited number of VLANS for all the hypervisors, and with tunnels between hypervisors allows for very flexible network constructs.

But when we create abstraction layers in networking, the benefits of moving things up one layer can also provide complexity and lack of event propagation between layers.

In the heyday of MPLS adoption in Service Provider networks, I saw various carriers consider and sometimes deploy large scale VPLS backbones for their core (composed of P/PE nodes), which provided MPLS Ethernet transport over either 10GE or SDN/SONET, and then they fully meshed their Multi-Service Edge MPLS nodes (ie. Purely PE) to this new VPLS domain. What they were trying to achieve was simplicity on their high touch PEs, they wanted a service to be anywhere and they wanted to have a simple topology so that all PEs can directly connect to each other PE. They were able to realize this goal, but in doing so they created other challenges. In this isolation of layers, there are issues because we clearly have protocols and behaviour that work independently, yet we care about the end-to-end communication.

In Data Center networks there will typically be less transport layer flapping or circuit issues than experienced in Wide Area Networks, yet we should still consider that event propagation is important because it allows protocols and algorithms to learn and do things better. For example, in large scale DCI there will likely be a need for Traffic Engineering across un-equal paths. In order to properly build overlays it is important to understand the underlay topology and behavior. In MPLS networks (which can be an underlay and overlay) we already have features such as Fast Re-Route, bypass tunnels, BFD and OAM. These protocols might not be required in DC networks, but some of the intrinsic benefits of them should be realized. If we use stateless tunnels in overlay networks, then how do we re-route when issues occur in the transport? What if there is a scheduled upgrade of a switch or router that will cause disruptions in service availability or speed? We should have a way to link the knowledge of the underlay with the overlay. Without this, networking is taking a large step backwards.

Overlay networking that is using VXLAN, NV-GRE, STT, CAPWAP, etc is highly useful to keep portions of the complexity away from the transport, in the same way that MPLS labels have no relation to the MAC forwarding tables in underlay Ethernet switches. But we have always learned in networking that having more knowledge about vectors, about links, and about load, then we are able to make more solid decisions and take actions.

Traditional Protocols Matter with SDN

First of all, I have been saying this for over a year: We have been doing SDN for a long time. We, as an industry, have been programming changes into our networks, with our own vision of what needs to happen, when it needs to happen, and doing things that protocols do not normally offer.

Software Defined Networking is useful when your protocols don’t cut it.

Let’s face it. Almost every protocol (OSPF, IS-IS, RIP, MP-BGP, etc) does one thing: it advertises vectors, or reachability of a destination. Some protocols do it better than others, some allow for more data to come along with those vectors such as tags, communities, targets, etc. Then you strap some logic on your routing nodes to interpret the received routing data, and you make decisions. Remember Policy Based Routing (PBR)? How is this any different than SDN? I would venture to say SDN offers a possibility to achieve many of the ideas discussed 8-10 years ago during PBR discussions.

The Method Matters

I have spent time immersed in SDN implementations, presentations, vendor meetings, and even knee-deep in the code. When you want to make changes to your network there are some basic questions that need to be asked:

  • Does this network change need to occur dynamically?
  • How can we quantify dynamic; is this every day, every week, or changes to occur in a totally flexible non-deterministic manner. (ie. people clicking things when they want will make changes to the network)
  • Do we have the tools to prevent bad stuff from happening. (Imagine that you allowed for the dynamic creation of MPLS LSPs across your network. What would happen if you did not implement safe upper limits on RSVP-TE objects or no limits on number of paths?)
  • How complex will this “simple program” be?

Protocols are Software

Protocols have state machines; they learn and distribute information and they have well defined routines with clearly articulated and implemented behavior. This can be said for the majority of the well used routing protocols that are defined by the IETF. Furthermore, people have learned how to work with these protocols, and for the protocols that allow additional extensions to their functionality, there is almost anything you can do with the flexible framework. Think: IS-IS TLVs or new BGP SAFI/NLRI/GENAPP.

There is already so much that the networking software industry has made available to developers and operators. As we want to¬†programmatically make configuration changes and drive ‘show commands’ on nodes we could use Expect libraries in any number of modern programming languages (PERL, Python, Ruby, etc), or drive alternative data models with NETCONF, RESTAPIs, YANG, OpenFlow Protocol, or SNMP.

So with all this choice comes the paralysis of complexity. Let’s take CLOUD for example, and think about the need for some flexibility in network topology and automation. In a very simple scenario, each tenant in a cloud environment will need some basic network isolation and routing. In enterprise clouds, this looks like a VLAN with some DHCP assignment. The cloud management system would work well if the association of a Virtual Machine (VM) to a network was flexible; and thus it would be useful to allow for the automation in the creation of the VLAN.

In OpenStack this could be achieved by nova-network which if you are using Linux Bridging, will gladly create a new VLAN, IP address, and bridge instance in the kernel. The challenge about automation, is that there usually is more to the picture; the on-ramp and off-ramp of traffic beyond the hypervisor. This could be the VRF interface binding, or VLAN creation on the ToR, or security policies on ToR or other Layer 2/3 devices that will process frames/packets for this newly instantiated network.

Sure, Quantum would be a useful element to create things dynamically. We could even find a way to have Quantum drive RESTAPIs over to an OpenFlow controller, or have Quantum directly make changes to upstream devices or ToRs. Then things get very complex because we have decided that there are different ways to propagate change events into our network.

When it comes to networking, there is a science. There is an agreed upon method on distribution of information, and there are agreed upon data formats. There have been plenty of protocols that were vetted in the IETF only to have vendors have different takes upon implementation, and then vendors either duke it out in the IETF or the consumer market demands that the vendors fix their code and make things interoperate. Interoperating is good for the consumer, and in the long run, it is probably even good for the vendor if they wish to gain customers through good will and upgrade strategies.

Today, as a technologies in the field of networking, I have a lot of ways to build cloud networking. I could fully embrace the overlay networking camp and do something like this:

  • Build a highly scalable Layer 3 Data Center Fabric
  • Install some cutting-edge bridging and tunneling software on each hypervisor (OVS or similar)
  • Pre-build all my public-facing network constructs

Now, all of this web of software might just work; but imagine when it doesn’t. In a scenario of network failure, we need to start from some layer. Which layer first? Is the issue related to transport between hypervisor? If so would this issue relate to a true transport problem, or an issue with tunnel encapsulation or transmission on the hypervisor? Are VLANs necessary on the ToR? Is the ToR routing correctly between other ToRs?

The argument that I believe needs to be brought the table, is that complexity of network control in software needs to be very similar to existing behavior, or it needs to have extremely open details about it’s intended behavior. This is what protocols bring to the table: they are not black boxes; you can read up on how they operate and pretty quickly you will understand why something is working or not.

BGP is the Truth

I had this discussion the other day with one of BGP’s creators; I told him that “I can get anyone off the street and ask them to show RIB-OUT and RIB-IN between two systems and get the truth; for me BGP is the truth”. Yes I trust some well defined XML-RPC exchanges or REST interfaces; but what I really like about BGP and other well defined protocols is that they have clear definitions and behavior. Of course there is enough rope provided to hang yourself when you get into complex VPN topologies with leaking Route Targets between a series of tables. But this is the fun of having a tool like this in your tool belt. You are free to have seriously complex or easy to understand network constructs.

Back to a previous thought as I started above; in CLOUD networking, we usually need to create a VLAN per tenant; but if this was too complex to automate outside of the hypervisor, we might just pre-build all the VLANs on the ToRs and L2 infrastructure. However, the downside to this approach is that there could be inefficient layer 2 domains. A much better approach would involve having MVRP running between the Virtual Switch on the hypervisor and the ToR. With MVRP we could have OpenStack’s Quantum or OVS simply advertise the presence of the VLAN to the switch and the layer 2 domain would be dynamically programmed on the directly connected ToR.

All in all, my thoughts are that it’s exciting to create new models for connectivity between hosts and nodes in networks; but we need to ensure that enough simplicity or “nodes of truth” exist. There should be¬†authoritative sources of truth on reachability, and for the last 30 years this has been visible in two places: the RIB and the FIB. When we move into new domains that have their own concepts of state, control , and abstraction, we need to have a way to monitor this new behavior. It’s not entirely a new concept; today even in the networking world we rarely see a true mix of optical transport layers (DWDM, SDH/SONET, etc) and Layer 2/3, but it will remain important to remember there is still a lot to be learned about scale from the Internet. Tools like MP-BGP, Route Reflectors, Route Targets, NETCONF, etc are just are useful today in a world of OpenFlow, Quantum Plugins, and new Linux Kernel Modules. The future is now, and the past is important.