A new feature has been recently introduced in OVN that allows multiple clusters to be interconnected at L3 level (here’s a link to the series of patches). This can be useful for scenarios with multiple availability zones (or physical regions) or simply to allow better scaling by having independent control planes yet allowing connectivity between workloads in separate zones.

Simplifying things, logical routers on each cluster can be connected via transit overlay networks. The interconnection layer is responsible for creating the transit switches in the IC database that will become visible to the connected clusters. Each cluster can then connect their logical routers to the transit switches. More information can be found in the ovn-architecture manpage.

I created a vagrant setup to test it out and become a bit familiar with it. All you need to do to recreate it is cloning and running ‘vagrant up‘ inside the ovn-interconnection folder:

https://github.com/danalsan/vagrants/tree/master/ovn-interconnection

This will deploy 7 CentOS machines (300MB of RAM each) with two separate OVN clusters (west & east) and the interconnection services. The layout is described in the image below:

Once the services are up and running, a few resources will be created on each cluster and the interconnection services will be configured with a transit switch between them:

Let’s see, for example, the logical topology of the east availability zone, where the transit switch ts1 is listed along with the port in the west remote zone:

1[root@central-east ~]# ovn-nbctl show
2switch c850599c-263c-431b-b67f-13f4eab7a2d1 (ts1)
3    port lsp-ts1-router_west
4        type: remote
5        addresses: ["aa:aa:aa:aa:aa:02 169.254.100.2/24"]
6    port lsp-ts1-router_east
7        type: router
8        router-port: lrp-router_east-ts1
9switch 8361d0e1-b23e-40a6-bd78-ea79b5717d7b (net_east)
10    port net_east-router_east
11        type: router
12        router-port: router_east-net_east
13    port vm1
14        addresses: ["40:44:00:00:00:01 192.168.1.11"]
15router b27d180d-669c-4ca8-ac95-82a822da2730 (router_east)
16    port lrp-router_east-ts1
17        mac: "aa:aa:aa:aa:aa:01"
18        networks: ["169.254.100.1/24"]
19        gateway chassis: [gw_east]
20    port router_east-net_east
21        mac: "40:44:00:00:00:04"
22        networks: ["192.168.1.1/24"]

As for the Southbound database, we can see the gateway port for each router. In this setup I only have one gateway node but, as any other distributed gateway port in OVN, it could be scheduled in multiple nodes providing HA

1[root@central-east ~]# ovn-sbctl show
2Chassis worker_east
3    hostname: worker-east
4    Encap geneve
5        ip: "192.168.50.100"
6        options: {csum="true"}
7    Port_Binding vm1
8Chassis gw_east
9    hostname: gw-east
10    Encap geneve
11        ip: "192.168.50.102"
12        options: {csum="true"}
13    Port_Binding cr-lrp-router_east-ts1
14Chassis gw_west
15    hostname: gw-west
16    Encap geneve
17        ip: "192.168.50.103"
18        options: {csum="true"}
19    Port_Binding lsp-ts1-router_west

If we query the interconnection databases, we will see the transit switch in the NB and the gateway ports in each zone:

1[root@central-ic ~]# ovn-ic-nbctl show
2Transit_Switch ts1
3 
4[root@central-ic ~]# ovn-ic-sbctl show
5availability-zone east
6    gateway gw_east
7        hostname: gw-east
8        type: geneve
9            ip: 192.168.50.102
10        port lsp-ts1-router_east
11            transit switch: ts1
12            address: ["aa:aa:aa:aa:aa:01 169.254.100.1/24"]
13availability-zone west
14    gateway gw_west
15        hostname: gw-west
16        type: geneve
17            ip: 192.168.50.103
18        port lsp-ts1-router_west
19            transit switch: ts1
20            address: ["aa:aa:aa:aa:aa:02 169.254.100.2/24"]

With this topology, traffic flowing from vm1 to vm2 shall flow from gw-east to gw-west through a Geneve tunnel. If we list the ports in each gateway we should be able to see the tunnel ports. Needless to say, gateways have to be mutually reachable so that the transit overlay network can be established:

1[root@gw-west ~]# ovs-vsctl show
26386b867-a3c2-4888-8709-dacd6e2a7ea5
3    Bridge br-int
4        fail_mode: secure
5        Port ovn-gw_eas-0
6            Interface ovn-gw_eas-0
7                type: geneve
8                options: {csum="true", key=flow, remote_ip="192.168.50.102"}

Now, when vm1 pings vm2, the traffic flow should be like:

(vm1) worker_east ==== gw_east ==== gw_west ==== worker_west (vm2).

Let’s see it via ovn-trace tool:

1[root@central-east vagrant]# ovn-trace  --ovs --friendly-names --ct=new net_east  'inport == "vm1" && eth.src == 40:44:00:00:00:01 && eth.dst == 40:44:00:00:00:04 && ip4.src == 192.168.1.11 && ip4.dst == 192.168.2.12 && ip.ttl == 64 && icmp4.type == 8'
2 
3 
4ingress(dp="net_east", inport="vm1")
5...
6egress(dp="net_east", inport="vm1", outport="net_east-router_east")
7...
8ingress(dp="router_east", inport="router_east-net_east")
9...
10egress(dp="router_east", inport="router_east-net_east", outport="lrp-router_east-ts1")
11...
12ingress(dp="ts1", inport="lsp-ts1-router_east")
13...
14egress(dp="ts1", inport="lsp-ts1-router_east", outport="lsp-ts1-router_west")
15 9. ls_out_port_sec_l2 (ovn-northd.c:4543): outport == "lsp-ts1-router_west", priority 50, uuid c354da11
16    output;
17    /* output to "lsp-ts1-router_west", type "remote" */

Now let’s capture Geneve traffic on both gateways while a ping between both VMs is running:

1[root@gw-east ~]# tcpdump -i genev_sys_6081 -vvnee icmp
2tcpdump: listening on genev_sys_6081, link-type EN10MB (Ethernet), capture size 262144 bytes
310:43:35.355772 aa:aa:aa:aa:aa:01 > aa:aa:aa:aa:aa:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 11379, offset 0, flags [DF], proto ICMP (1), length 84)
4    192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 40, length 64
510:43:35.356077 aa:aa:aa:aa:aa:01 > aa:aa:aa:aa:aa:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 11379, offset 0, flags [DF], proto ICMP (1), length 84)
6    192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 40, length 64
710:43:35.356442 aa:aa:aa:aa:aa:02 > aa:aa:aa:aa:aa:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 42610, offset 0, flags [none], proto ICMP (1), length 84)
8    192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 40, length 64
910:43:35.356734 40:44:00:00:00:04 > 40:44:00:00:00:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 42610, offset 0, flags [none], proto ICMP (1), length 84)
10    192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 40, length 64
11 
12 
13[root@gw-west ~]# tcpdump -i genev_sys_6081 -vvnee icmp
14tcpdump: listening on genev_sys_6081, link-type EN10MB (Ethernet), capture size 262144 bytes
1510:43:29.169532 aa:aa:aa:aa:aa:01 > aa:aa:aa:aa:aa:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 8875, offset 0, flags [DF], proto ICMP (1), length 84)
16    192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 34, length 64
1710:43:29.170058 40:44:00:00:00:10 > 40:44:00:00:00:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 8875, offset 0, flags [DF], proto ICMP (1), length 84)
18    192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 34, length 64
1910:43:29.170308 aa:aa:aa:aa:aa:02 > aa:aa:aa:aa:aa:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 38667, offset 0, flags [none], proto ICMP (1), length 84)
20    192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 34, length 64
2110:43:29.170476 aa:aa:aa:aa:aa:02 > aa:aa:aa:aa:aa:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 38667, offset 0, flags [none], proto ICMP (1), length 84)
22    192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 34, length 64

You can observe that the ICMP traffic flows between the transit switch ports (aa:aa:aa:aa:aa:02 <> aa:aa:aa:aa:aa:01) traversing both zones.

Also, as the packet has gone through two routers (router_east and router_west), the TTL at the destination has been decremented twice (from 64 to 62):

1[root@worker-west ~]# ip net e vm2 tcpdump -i any icmp -vvne
2tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
310:49:32.491674  In 40:44:00:00:00:10 ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 57504, offset 0, flags [DF], proto ICMP (1), length 84)

This is a really great feature that opens a lot of possibilities for cluster interconnection and scaling. However, it has to be taken into account that it requires another layer of management that handles isolation (multitenancy) and avoids IP overlapping across the connected availability zones.