Ships in the Night Cause Me Fright

Ships in the Night
There is a common trend in networking and virtualization today: Make things automated. This is a positive step. For the longest, we have rewarded complexity. Take for example, the focus of most professional networking certifications on corner case scenarios with arbitrary routing protocols being redistributed into other protocols with heaps of complex policy.

Now there seems to be a movement to “hop over” the complexities that exist in the network by building overlays. It’s not entirely a new concept, as MPLS over GRE, IPSEC mesh VPNS, or even PPP/L2TP have long been our precedents for constructing connectivity over existing transports.

It’s fantastic that we are building with layers and leveraging existing frameworks and taking the new moving part up to another layer. Take for example the trend in hypervisor networking for overlays: the Data Center network remains simple, maybe a limited number of VLANS for all the hypervisors, and with tunnels between hypervisors allows for very flexible network constructs.

But when we create abstraction layers in networking, the benefits of moving things up one layer can also provide complexity and lack of event propagation between layers.

In the heyday of MPLS adoption in Service Provider networks, I saw various carriers consider and sometimes deploy large scale VPLS backbones for their core (composed of P/PE nodes), which provided MPLS Ethernet transport over either 10GE or SDN/SONET, and then they fully meshed their Multi-Service Edge MPLS nodes (ie. Purely PE) to this new VPLS domain. What they were trying to achieve was simplicity on their high touch PEs, they wanted a service to be anywhere and they wanted to have a simple topology so that all PEs can directly connect to each other PE. They were able to realize this goal, but in doing so they created other challenges. In this isolation of layers, there are issues because we clearly have protocols and behaviour that work independently, yet we care about the end-to-end communication.

In Data Center networks there will typically be less transport layer flapping or circuit issues than experienced in Wide Area Networks, yet we should still consider that event propagation is important because it allows protocols and algorithms to learn and do things better. For example, in large scale DCI there will likely be a need for Traffic Engineering across un-equal paths. In order to properly build overlays it is important to understand the underlay topology and behavior. In MPLS networks (which can be an underlay and overlay) we already have features such as Fast Re-Route, bypass tunnels, BFD and OAM. These protocols might not be required in DC networks, but some of the intrinsic benefits of them should be realized. If we use stateless tunnels in overlay networks, then how do we re-route when issues occur in the transport? What if there is a scheduled upgrade of a switch or router that will cause disruptions in service availability or speed? We should have a way to link the knowledge of the underlay with the overlay. Without this, networking is taking a large step backwards.

Overlay networking that is using VXLAN, NV-GRE, STT, CAPWAP, etc is highly useful to keep portions of the complexity away from the transport, in the same way that MPLS labels have no relation to the MAC forwarding tables in underlay Ethernet switches. But we have always learned in networking that having more knowledge about vectors, about links, and about load, then we are able to make more solid decisions and take actions.



2 thoughts on “Ships in the Night Cause Me Fright

  1. A sanguine observation, Truman.

    What we haven’t heard much of is a discussion of what the /underlay/ network to support an overlay implementation should look like.

    The prevailing notion today, is that the underlying network should ‘get out of the way’ — but I would argue that the underlying network really needs a way to interface with the overlay world.

    This could mean absorbing hints to differentiate forwarding of overlay traffic, or even cooperate with the overlay for troubleshooting specific traffic flows.

    In medicine, we can inject contrast into patients to better visualize various internal structures — can we do the same thing with the underlying network? If you know you’re troubleshooting a specific flow, and can inform the underlying network via some form of API and abstract network description language to attach instrumentation to specific flows?

    And to your initial point, how would you automate this process? Can you deliver an API for troubleshooting and analysis, and not just initial network provisioning.

  2. David Wheeler
    “All problems in computer science can be solved by another level of indirection”.

    Kevlin Henney
    The corollary to this is
    “…except for the problem of too many layers of indirection.“

Leave a Reply

Your email address will not be published. Required fields are marked *