Revamp CentOS Community Container Pipeline to run on OpenShift
Monday, 8, October 2018 Dharmit Shah announcement, builds, Community No Comments

It's been over a year since we published anything about the CentOS Community Container Pipeline. Many interesting things have happened during the past year, many things have changed and there's a complete shift in the architecture of the service that's was rolled out over the last weekend.

Wait, I've never heard of this project

If this is the first time you're hearing about CentOS Community Container Pipeline project, it would be best to refer this blog post, or the GitHub repo of the project, or the wiki page. But to put it in short, the service does below things:

  • Pre-build the artifacts/binaries to be added to the container image
  • Lint the Dockerfile for adherence to best practices
  • Build the container image
  • Scan the image for:
    • available RPM updates
    • updates for packages installed via other package managers:
      • npm
      • pip
      • gem
    • Verify RPM installed files and binaries for integrity
    • point out capabilities of container created from the resulting image by examining RUN label in its Dockerfile
  • Weekly scanning of the container images using above scanners
  • Automatic rebuild of container image when the git repo is modified
  • Parent-child relationship between images to automatically trigger rebuild of child image when parent image gets updated
  • Repo tracking to automatically rebuild the container image in event of an RPM getting updated in any of its configured repos (not available yet in new architecture)
  • A UI that lists all the container images built with the service at registry.centos.org.

How did the old system work?

When we talked about the project at DevConf.cz '18, we received a positive response from the audience. However, at that time, we knew that our service couldn't handle more build requests and on-boarding more community projects would be counter-productive when our backend didn't have the ability to serve those requests.

Old implementation of the service had a lot of plumbing. There are workers written for most of the features mentioned above.

  • Pre-build happened on CentOS CI (ci.c.o) infrastructure.
  • Lint worker ran as a systemd service.
  • Build worker ran as a standalone container and triggered a build in an OpenShift cluster.
  • Scan worker ran as a systemd service and used atomic scan to scan the containers. This in turn spun up a few containers which we needed to delete along with their volumes to make sure that host system disk doesn’t get filled up.
  • Weekly scanning was a Jenkins job that checked against container index, registry.centos.org and underlying database of the service before triggering a weekly scan
  • Repo tracking was a Django project and heavily relied on database which we almost always failed to successfully migrate whenever the schema was changed. That's our shortcoming, not Django's. All these heterogeneous pieces talked through beanstalkd.

Everything was spread across different hosts and we were using really huge Ansible playbooks to bring up the service. A fresh deployment took 30 minutes on an average. Testing any change in dev environment would require us to do a redeployment of the service which took another 15 minutes on an average. Deploying and maintaining this service was quite a pain!

What did we do about these problems?

Since long time we were discussing about developing our service on top of OpenShift. Then, at some point, we read about OpenShift Pipeline and found it interesting. We took the plunge and came up with a proof of concept implementation of CentOS Community Container Pipeline on top of OpenShift OKD using Minishift. Results were exciting! We were able to do parallel builds of container image, Jenkins Pipelines orchestrated the flow really well, build times were faster, we didn't need to use beanstalkd at all and, most importantly, there was very less code written to get things done!

With the POC in place, we went ahead with developing more tangible service on top of a real OpenShift cluster instead of developing on top of Minishift. What used to be individual workers doing their thing in old system is now pretty much all inside OpenShift Pipeline.

We now have an OpenShift Pipeline for every project on CentOS Container Index that does Pre-build, Dockerfile lint, container image build, scan the container image and push it to external registry; all from a single container! We have another OpenShift Pipeline for every project to do their weekly scans. So instead of having five workers to do these tasks and communicate with each other via beanstalkd, we have orchestrated things through OpenShift Pipelines.

What are we working on now?

We don't have Repo tracking implemented in the new architecture yet. We don't have a UI for the users to take a look at their build logs or weekly scan logs either. We're initially focusing on getting the UI for logs up and then we will start working on Repo tracking.  We are also working on setting up a CI job that tests core parts of the service on Minishift so that anyone willing to take the service for a spin should literally be able to do it on a Minishift VM!

Let us know your thoughts!

This project is solely focused on making things easier for open-source projects and its developers. If you are working on an open-source project that's building on top of CentOS, we would like to know your thoughts. If you need help getting started, you can contact us on IRC (#centos-devel on Freenode) or take a look at project documentation.

Dharmit Shah (dharmit on #centos-devel IRC)

Leave a Reply

Your email address will not be published. Required fields are marked *