When I first started to work in networking-ovn one of the first tasks I took up was to implement the ability for instances to fetch userdata and metadata at boot, such as the name of the instance, public keys, etc.
This involved introducing a new service which we called networking-ovn-metadata-agent and it’s basically a process running in compute nodes that intercepts requests from instances within network namespaces, adds some headers and forwards those to Nova. While it was fun working on it I soon realized that most of the work would be on the TripleO side and, since I was really new to it, I decided to take the challenge as well!
If you’re interested in the actual code for this feature (not the deployment related code), I sent the following patches to implement it and I plan to write a different blogpost for it:
- https://docs.openstack.org/networking-ovn/latest/contributor/design/metadata_api.html
- https://github.com/openvswitch/ovs/commit/2a38ef4520f646df2ad6e879aa7825e1cec48bac
- https://github.com/openvswitch/ovs/commit/37737b96e4686ab83e8d582afbaf58eb2b847332
- https://review.openstack.org/#/c/471140/
But implementing this feature didn’t end there and we had to support a way to deploy the new service from TripleO and, YES!, it has to be containerized. I found out that this was not a simple task so I decided to write this post with the steps I took hoping it helps people going through the same process. Before describing those, I want to highlight a few things/tips:
- It’s highly recommended to have access to a hardware capable of deploying TripleO and not relying only on the gate. This will speed up the development process *a lot* and lower pressure on the gate, which sometimes has really long queues.
- The jobs running on the upstream CI use one node for the undercloud and one node for the overcloud so it’s not always easy to catch failures when you deploy services on certain roles like in this case where we only want the new service in computes. CI was green on certain patchsets while I encountered problems on 3 controllers + 3 computes setups due to this. Usually production environments would be HA so it’s best to do the development on a more realistic setups whenever possible.
- As of today, with containers, there’s no way that you can send a patch on your project (in this case networking-ovn), add a Depends-On in tripleo-* and expect that the new change is tested. Instead, the patch has to get merged, an RDO promotion has to occur so that the RPM with the fix is available and then, the container images get built and ready to be fetched by TripleO jobs. This is a regression when compared to non-containerized jobs and clearly slows down the development process. TripleO folks are doing a great effort to include a mechanism that supports this which will be a huge leap 🙂
- To overcome the above, I had to go over the process of building my own kolla images and setting them up in the local registry (more info here). This way you can do your own testing without having to wait for the next RDO promotion. Still, your patches won’t be able to merge until it happens but you can keep on developing your stuff.
- To test any puppet changes, I had to patch the overcloud image (usually mounting it with guestmount and changing the required files) and update it before redeploying. This is handy as well as the ‘virt-customize’ command which allows you to execute commands directly in the overcloud image, for example, to install a new RPM package (for me usually upgrading OpenvSwitch for testing). This is no different from baremetal deployment but still useful here.
After this long introduction, let me go through the actual code that implements this new service.
1. Getting the package and the container image ready:
- First step is to write the spec file for the new RPM: https://review.rdoproject.org/r/#/c/7739/
- Once the RPM was available, we can write the Dockerfile that will tell Kolla how to build the image on the different platforms and both from sources and binaries: https://review.openstack.org/#/c/511225/
-
The next step is to send a patch to tripleo-common so that the image above gets pushed into the docker registry. After these are merged, the new container image will be available in the next RDO promotion.
- When I started writing this patch, baremetal jobs were still in place so I also sent a patch to tripleo-puppet-elements so that the new package gets installed in the overcloud image used for Compute nodes. https://review.openstack.org/#/c/527076/
At this point we should be able to consume the container image from TripleO. The next step is to configure the new service with the right parameters.
2. New service configuration:
puppet-neutron:
This new service will require some configuration options. Writing those configuration options will be done by puppet and, since it is a networking service, I sent a patch to puppet-neutron to support it: https://review.openstack.org/#/c/502941/
All the configuration options as well as the service definition are in this file:
https://review.openstack.org/#/c/502941/9/manifests/agents/ovn_metadata.pp
As we also want to set a knob which enables/disables the service in the ML2 plugin, I added an option for that here:
https://review.openstack.org/#/c/502941/9/manifests/plugins/ml2/ovn.pp@59
The patch also includes unit tests and a release note as we’re introducing a new service.
puppet-tripleo:
We want this service to be deployed at step 4 (see OpenStack deployment steps for more information) and also, we want networking-ovn-metadata-agent to be started after ovn-controller is up and running. ovn-controller service will be started after OpenvSwitch so this way we’ll ensure that our service will start at the right moment. For this, I sent a patch to puppet-tripleo:
https://review.openstack.org/#/c/502940/
And later, I found out that I wanted the neutron base profile configuration to be applied to my own service. This way I could benefit from some Neutron common configuration such as logging files, etc.: https://review.openstack.org/527482
3. Actual deployment in tripleo-heat-templates:
This is the high-level work that drives all the above and, initially, I had it in three different patches which I ended up squashing because of the inter-dependencies.
https://review.openstack.org/#/c/502943/
To sum up, this is what this patch does
- Define the parameters that we’re exposing in puppet so that the new service gets properly configured: https://review.openstack.org/#/c/502943/32/puppet/services/ovn-metadata.yaml
- Add the docker service file that will define the configuration within the container using the file above (see this) as well as the healthcheck mechanism, etc. Also, since we’ll be interacting with OpenvSwitch and it’s running in the host, we’ll be mounting /var/run/openvswitch (see this) inside the container. This way we can interact with other networking services using OVS.
- Before networking-ovn-metadata-agent was introduced, instances were fetching their metadata from a config drive. Now, we want to enable the new service and disable config drive from Nova. This has to be configured within the controllers: https://review.openstack.org/#/c/502943/32/puppet/services/ovn-controller.yaml. If OVN Metadata is enabled we’ll disable config drive and viceversa.
- Add the new service to the Compute* roles as we only want it on computes: https://review.openstack.org/#/c/502943/32/roles/Compute.yaml@64
- Add the new service to the environment files used by tripleo-heat-templates when deploying the overcloud for both baremetal and containers:
I hope this help others introducing new services in OpenStack! Feel free to reach out to me for any comments/corrections/suggestions by e-mail or on IRC 🙂