Now that we can demonstrate the latest version of Switchboard working with real hardware (from two different vendors) on the production campus network, it seemed like time to celebrate the event with an update on Switchboard and some screen captures of a demo walkthrough.
bypass network configured connecting 10.138.96.17 and 152.3.9.2
Maintaining State Across SDN Controller Restarts
User-driven reconfiguration of the SDN network means that we have an interesting problem: how to we restore the state of the network in the event of a reboot (or crash) of the SDN controller?
Switchboard addresses this issue by caching the commands it has issued to the controller in a mysql database table. To restore the state of the SDN network after a controller restart, we simply replay the commands in order. Since these commands include user-requested route adds and deletes, getting back to a given state is straightforward.
To automate recovery from SDN controller restarts, the controller startup script could wait until the controller has started up, then make a REST request to Switchboard to trigger playback of the command cache and so restore state.
Debugging Misconfigured SDN ports
One issue that we have run into over the last few weeks is the lack of transparency about the status of SDN switch ports when issuing RYU REST commands. This first came up when a port that was assumed to be live did not have OpenFlow enabled — so trying to add a default gateway via RYU was returning confusing messages that refer to a route that does not exist.
[{"switch_id": "0000000000000333", "command_result": [{"result": "failure", "details": "Destination overlaps [route_id=1]"}]}]
Victor Orlikowski modified the RYU REST router code to return the MAC address when a gateway is requested, and in the event no MAC address is found (i.e. gateway_mac: null) it is likely that a port on the switch that you think is live is actually disabled. I’ll be adding error trapping on the dreaded gateway_mac: null conditions
Source for the RYU REST Router is here:
git://gitorious.oit.duke.edu/switchboard_rest_router/switchboard_rest_router.git
Testbed
Testing the bypass network means having a couple different networks that can by connected on-the-fly. As a very basic setup testing we have some blades in the DSCR that are connected to a Cisco 4500 and are sitting on the 10.138.96.0/25 network. The 4500 is connected to (among other things) an Arista switch in the Telcom building – this switch acts as the hub. The Telcom Arista is also connected to an Arista switch in North Building 011 serving some Exogeni hosts on the 152.3.9.0/25 network. So the hosts in the two networks are accessible from the Duke campus network, but do not have a path between them unless the bypass network is enabled.
A management VLAN connects an OpenFlow controller to each of the SDN switches, and the controller is running the modified version of the RYU-REST router code. To configure the network, we are running the Switchboard application at switchboard.oit.duke.edu and the app sends REST commands to the SDN controller.
Running simple tests
When the controller is started, all it knows are the DPIDs of the switches, so the first thing for Switchboard to do is get a basic configuration onto the switches — something that a Switchboard app administrator can do from the admin web page.
We are running a simple topology: the switch in Telcom acts as a central hub that the other switches connect through. You have to start somewhere so the basic configuration that Switchboard sets up is the dataplane network to interconnect the switches and the hub and host network(s) for the switches, but no routes across the bypass network are enabled until the user makes a request.
This basic network config can then be modified by authorized users who request links between pairs of hosts via the Switchboard web site. First, Switchboard tries to get authorization for the requested connection endpoints (by consulting an internal database of which users are authoritative for which addresses) and asking those users to OK the request. Once both endpoints are authorized, Switchboard needs to construct the routes to add to the basic configuration.
Switchboard determines routes by querying the SDN controller to get the current state of the network, then identifies the switch DPIDs for the switches where the hosts are connected based on the host’s IP address and the network that are directly connected to each switch.
Once Switchboard knows which switches to configure, it adds static routes to the endpoint switches and the central hub. If a user asks to remove a route via Switchboard the process is similar, but instead of making RYU REST-Router calls to add static routes, calls are made to delete the routes.
Walkthrough
The SDN controller has been restarted, so there are no networks or addresses configured.
After the admin tells switchboard to configure the basics, the switches have addresses but no static routes to connect hosts
Visualization of the basic network config. The SDN bypass network is 10.185.1.0/24, and you can see all three switches are connected to it. Two of the switches have a host network as well (10.138.96.0/25 and 152.3.9.0/25). The switch acting as a hub has the ID 1c73662025.
I fill out a form to request a bypass link between two hosts
The request is tacitly approved (because I am authorized to OK requests in both of the subnets). If I was not authorized for both subnets I would have had to wait for an approver to OK the request.
As soon as the request was approved, Switchboard set up static routes through the bypass network.
My home page in Switchboard now shows my request as being in approved state. I can also see previous requests that I have revoked (shown in gray)
After my request was approved, the network visualization changed – the static routes across the bypass network are visible.
I’m done with the bypass route, and click on the “revoke” link. Switchboard asks if I am really sure.
After revoking the bypass connection, the dynamically added networks are removed (and the net visualization goes back to what it used to be).
The SDN logs show the details – I can see the current state of the network and the commands that deleted the static routes.
My request history has a high level summary of what has been happening/
Pings work while the bypass is in place, and stop after it is deleted.
and the pings do what we expect on both endpoints.