A new feature has been recently introduced in OVN that allows multiple clusters to be interconnected at L3 level (here’s a link to the series of patches). This can be useful for scenarios with multiple availability zones (or physical regions) or simply to allow better scaling by having independent control planes yet allowing connectivity between workloads in separate zones.
Simplifying things, logical routers on each cluster can be connected via transit overlay networks. The interconnection layer is responsible for creating the transit switches in the IC database that will become visible to the connected clusters. Each cluster can then connect their logical routers to the transit switches. More information can be found in the ovn-architecture manpage.
I created a vagrant setup to test it out and become a bit familiar with it. All you need to do to recreate it is cloning and running ‘vagrant up‘ inside the ovn-interconnection folder:
https://github.com/danalsan/vagrants/tree/master/ovn-interconnection
This will deploy 7 CentOS machines (300MB of RAM each) with two separate OVN clusters (west & east) and the interconnection services. The layout is described in the image below:

Once the services are up and running, a few resources will be created on each cluster and the interconnection services will be configured with a transit switch between them:

Let’s see, for example, the logical topology of the east availability zone, where the transit switch ts1 is listed along with the port in the west remote zone:
2 | switch c850599c-263c-431b-b67f-13f4eab7a2d1 (ts1) |
3 | port lsp-ts1-router_west |
5 | addresses: [ "aa:aa:aa:aa:aa:02 169.254.100.2/24" ] |
6 | port lsp-ts1-router_east |
8 | router-port: lrp-router_east-ts1 |
9 | switch 8361d0e1-b23e-40a6-bd78-ea79b5717d7b (net_east) |
10 | port net_east-router_east |
12 | router-port: router_east-net_east |
14 | addresses: [ "40:44:00:00:00:01 192.168.1.11" ] |
15 | router b27d180d-669c-4ca8-ac95-82a822da2730 (router_east) |
16 | port lrp-router_east-ts1 |
17 | mac: "aa:aa:aa:aa:aa:01" |
18 | networks: [ "169.254.100.1/24" ] |
19 | gateway chassis: [gw_east] |
20 | port router_east-net_east |
21 | mac: "40:44:00:00:00:04" |
22 | networks: [ "192.168.1.1/24" ] |
As for the Southbound database, we can see the gateway port for each router. In this setup I only have one gateway node but, as any other distributed gateway port in OVN, it could be scheduled in multiple nodes providing HA:
1 | [root@central-east ~]# ovn-sbctl show |
12 | options: {csum="true"} |
13 | Port_Binding cr-lrp-router_east-ts1 |
18 | options: {csum="true"} |
19 | Port_Binding lsp-ts1-router_west |
If we query the interconnection databases, we will see the transit switch in the NB and the gateway ports in each zone:
1 | [root@central-ic ~]# ovn-ic-nbctl show |
4 | [root@central-ic ~]# ovn-ic-sbctl show |
10 | port lsp-ts1-router_east |
12 | address: ["aa:aa:aa:aa:aa:01 169.254.100.1/24"] |
18 | port lsp-ts1-router_west |
20 | address: ["aa:aa:aa:aa:aa:02 169.254.100.2/24"] |
With this topology, traffic flowing from vm1 to vm2 shall flow from gw-east to gw-west through a Geneve tunnel. If we list the ports in each gateway we should be able to see the tunnel ports. Needless to say, gateways have to be mutually reachable so that the transit overlay network can be established:
1 | [root@gw-west ~]# ovs-vsctl show |
2 | 6386b867-a3c2-4888-8709-dacd6e2a7ea5 |
8 | options: {csum="true", key=flow, remote_ip="192.168.50.102"} |
Now, when vm1 pings vm2, the traffic flow should be like:
(vm1) worker_east ==== gw_east ==== gw_west ==== worker_west (vm2).
Let’s see it via ovn-trace tool:
1 | [root@central-east vagrant] |
4 | ingress(dp= "net_east" , inport= "vm1" ) |
6 | egress(dp= "net_east" , inport= "vm1" , outport= "net_east-router_east" ) |
8 | ingress(dp= "router_east" , inport= "router_east-net_east" ) |
10 | egress(dp= "router_east" , inport= "router_east-net_east" , outport= "lrp-router_east-ts1" ) |
12 | ingress(dp= "ts1" , inport= "lsp-ts1-router_east" ) |
14 | egress(dp= "ts1" , inport= "lsp-ts1-router_east" , outport= "lsp-ts1-router_west" ) |
15 | 9. ls_out_port_sec_l2 (ovn-northd.c:4543): outport == "lsp-ts1-router_west" , priority 50, uuid c354da11 |
17 | /* output to "lsp-ts1-router_west" , type "remote" */ |
Now let’s capture Geneve traffic on both gateways while a ping between both VMs is running:
1 | [root@gw-east ~]# tcpdump -i genev_sys_6081 -vvnee icmp |
2 | tcpdump: listening on genev_sys_6081, link-type EN10MB (Ethernet), capture size 262144 bytes |
3 | 10:43:35.355772 aa:aa:aa:aa:aa:01 > aa:aa:aa:aa:aa:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 11379, offset 0, flags [DF], proto ICMP (1), length 84) |
4 | 192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 40, length 64 |
5 | 10:43:35.356077 aa:aa:aa:aa:aa:01 > aa:aa:aa:aa:aa:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 11379, offset 0, flags [DF], proto ICMP (1), length 84) |
6 | 192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 40, length 64 |
7 | 10:43:35.356442 aa:aa:aa:aa:aa:02 > aa:aa:aa:aa:aa:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 42610, offset 0, flags [none], proto ICMP (1), length 84) |
8 | 192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 40, length 64 |
9 | 10:43:35.356734 40:44:00:00:00:04 > 40:44:00:00:00:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 42610, offset 0, flags [none], proto ICMP (1), length 84) |
10 | 192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 40, length 64 |
13 | [root@gw-west ~]# tcpdump -i genev_sys_6081 -vvnee icmp |
14 | tcpdump: listening on genev_sys_6081, link-type EN10MB (Ethernet), capture size 262144 bytes |
15 | 10:43:29.169532 aa:aa:aa:aa:aa:01 > aa:aa:aa:aa:aa:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 8875, offset 0, flags [DF], proto ICMP (1), length 84) |
16 | 192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 34, length 64 |
17 | 10:43:29.170058 40:44:00:00:00:10 > 40:44:00:00:00:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 8875, offset 0, flags [DF], proto ICMP (1), length 84) |
18 | 192.168.1.11 > 192.168.2.12: ICMP echo request, id 5494, seq 34, length 64 |
19 | 10:43:29.170308 aa:aa:aa:aa:aa:02 > aa:aa:aa:aa:aa:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 38667, offset 0, flags [none], proto ICMP (1), length 84) |
20 | 192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 34, length 64 |
21 | 10:43:29.170476 aa:aa:aa:aa:aa:02 > aa:aa:aa:aa:aa:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 38667, offset 0, flags [none], proto ICMP (1), length 84) |
22 | 192.168.2.12 > 192.168.1.11: ICMP echo reply, id 5494, seq 34, length 64 |
You can observe that the ICMP traffic flows between the transit switch ports (aa:aa:aa:aa:aa:02 <> aa:aa:aa:aa:aa:01) traversing both zones.
Also, as the packet has gone through two routers (router_east and router_west), the TTL at the destination has been decremented twice (from 64 to 62):
1 | [root@worker-west ~]# ip net e vm2 tcpdump -i any icmp -vvne |
2 | tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes |
3 | 10:49:32.491674 In 40:44:00:00:00:10 ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 57504, offset 0, flags [DF], proto ICMP (1), length 84) |
This is a really great feature that opens a lot of possibilities for cluster interconnection and scaling. However, it has to be taken into account that it requires another layer of management that handles isolation (multitenancy) and avoids IP overlapping across the connected availability zones.