This is a summary of the work done on initiatives by the Community Platform Engineering (CPE) Team in Red Hat. Each quarter, the CPE Team—together with CentOS and Fedora community representatives—chooses initiatives to work on in the quarter. The CPE Team is then split into multiple smaller sub-teams that will work on chosen initiatives, plus the day-to-day work that needs to be done.
Following is the list of sub-teams in this quarter:
- Infra & Releng
- CentOS Stream/Emerging RHEL
- Datanommer/Datagrepper
- DNF Counting
- Metrics for Apps on OpenShift
Infra & Releng
About
The purpose of this team is to take care of day-to-day business regarding CentOS and Fedora Infrastructure and Fedora release engineering work. It’s responsible for services running in Fedora and CentOS infrastructure and preparing things for the new Fedora release (mirrors, mass branching, new namespaces etc.). This sub-team is also investigating possible initiatives. This is done by the Advance Reconnaissance Crew (ARC), which is formed from the Infra & Releng sub-team members based on the initiative that is being investigated.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- Mark O’Brien (Team Lead) (Fedora Operations, CentOS Operations) (mobrien)
- Michal Konecny (Agile Practitioner) (Developer) (zlopez)
- Kevin Fenzi (Fedora Operations) (nirik)
- Fabian Arrotin (CentOS Operations) (arrfab)
- Tomas Hrcka (Fedora Release Engineering) (humaton)
- Lenka Segura (Developer) (lenkaseg)
- Emma Kidney (Developer) (ekidney)
- Ben Capper (Developer) (bcapper)
What the sub-team did in Q3 2021
Fedora Infrastructure
In addition to the normal maintenance tasks (reboots, updates for security issues, creating groups/lists, fixing application issues) we worked on a number of items:
- Cleaned up nagios checks to stop alerting on swap on hardware machines
- Moved the vast majority of our instances to use linux-system-roles/networking to configure networking via ansible
- Got broken openqa-p09-worker02 back up and working with a lot of firmware upgrades and help from IBM techs.
- Archived off ~35TB of space from our netapp to a storinator
- Got zodbot (our IRC bot) moved to python3 and pointed to the new account system
- Upgraded the wiki to the latest stable version.
- Fixed an issue with OSBS building 0ad, needed a larger than default container.
- Setup new fedora matrix hosted server rooms/etc.
- Started on EPEL9 setup, mirroring centos9stream buildroot content, etc
- Got vmhost-x86-copr04’s motherboard replaced and back in service.
- Kinoite website deployed
CentOS Stream
- prepared the new mirror network to accept CentOS Stream 9
- modified koji/cbs.centos.org to allow building for CentOS Stream 9, including new tags
- importing 9-stream content
- modified SIG process to include/support stream 9 for modified requirements (directory layout, included sources and debuginfo vs what we had before )
- prepare the needed infra for AWS for EC2 testing and replication across all regions for CentOS Stream 9 images
CentOS common/public infrastructure
- converting all deployed CentOS Linux 8 to CentOS Stream 8
- relocated the armhfp community builders to other DC/hardware
- started investigation about migrating from Pagure 5.8 on CentOS 7 to Pagure 5.13 on CentOS Stream 8
- created https://docs.infra.centos.org doc website, and working in pairing mode to share infra knowledge within the team
- collaboration with artwork SIG to prepare some *.dev* variants of websites to have a "playground" to test Ansible role changes directly and then having corresponding PR for deployments in .stg. and then prod
- Business As Usual (BAU)
- koji tags creation
- hardware issues to fix/follow
CentOS CI infrastructure
- updated openshift to 4.8.x stable branch
- moved/onboarded new tenants on CI infra
- moved some workload in CI infra for better resiliency and backup plans
- expanded the existing cloud.cico (opennebula) infra with new hypervisors (x86_64)
- reorganized the slow nfs storage box (out of warranty) with raid10 layout to speed up/help with containers in openshift (for PersistentVolumes)
Fedora Release Engineering
While taking care of day to day business like nightly composes, package retirements and unretirements, new scm requests and occasional koji issues, we worked on new Fedora release.
- Mass rebuild of rpms and modules in Fedora Rawhide
- Branching of Fedora 35 from Rawhide
- Fedora Linux 35 Beta release
ARC
Investigated upgrading the Frontend Web UI for the CentOS mailing list. The investigation came to the conclusion that Mailman3, Postorius and Hyperkitty would need to be packaged for EPEL8. A new server would need to be deployed with the current CentOS mailing list migrated to it.
CentOS Stream/Emerging RHEL
About
This initiative is working on CentOS Stream/Emerging RHEL to make this new distribution a reality. The goal of this initiative is to prepare the ecosystem for the new CentOS Stream.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- Brian Stinson (Team Lead) (bstinson)
- Adam Samalik (Agile Practitioner) (asamalik)
- Aoife Moloney (Product Owner) (amoloney)
- Carl George
- James Antill
- Johnny Hughes
- Mohan Boddu (mboddu)
- Merlin Mathesius
- Stephen Gallagher (sgallagh)
- Troy Dawson (tdawson)
- Petr Bokoc (pbokoc)
What the sub-team did in Q3 2021
One thing we tackled was enabling side tag builds for Fedora ELN. Initially, we wanted to implement proper side tags for ELN, but we eventually settled for a simpler approach where we tag the Rawhide builds in, and then rebuild them in ELN. This ensures that we get all the packages built in ELN, with the Rawhide build as a backup should it fail in ELN. And we can even use this as a health metric for ELN — how many ELN packages are actually ELN builds.
For CentOS Stream 9, we have cloud images in AWS available. You can get it by searching for "centos stream 9" in AWS, and to make sure you get the latest you can add this month (so "202110" for October 2021).
Also, CentOS Stream 9 repositories are now available through mirrors using a meta link. Existing systems get this set up automatically with an update, as the centos-release package will include this metalink. This will take some load off the CentOS infra and potentially even make your updates faster.
Datanommer/Datagrepper
About
Goal of this initiative is to update and enhance Datanommer and Datagrepper apps. Datanommer is the database that is used to store all of the fedora messages sent in the Fedora Infrastructure. Datagrepper is an API with web GUI that allows users to find messages stored in Datanommer database. Current solution is slow and the database data structure is not optimal for storing current amounts of data. And here is when this initiative comes into play.
Issue trackers
Application URLs
Members of sub-team for Q3 2021
- Aurelien Bompard (Team Lead) (abompard)
- Aoife Moloney (Product Owner) (amoloney)
- Ellen O’Carroll (Product Owner)
- Ryan Lerch (ryanlerch)
- Lenka Segura (lsegura)
- James Richardson (jrichardson)
- Stephen Coady (scoady)
What the sub-team did in Q3 2021
Datanommer and Datagrepper have been upgraded to use TimescaleDB, an open-source relational database for time-series data. TimescaleDB is a PostgreSQL extension that takes care of sharding the large amount of data that we have (and keep generating!), and maintains an SQL-compatible interface for applications.
Datagrepper and the Datanommer consumer are now running in OpenShift instead of dedicated VMs.
DNF Counting
About
DNF Counting is used to obtain data on how Fedora is consumed by users. The current implementation experiences timeouts and crashes when the data are obtained. This initiative is trying to make the retrieval of counting data more reliable and efficient.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- Nils Phillipsen (Team Lead) (nils)
- Aoife Moloney (Product Owner) (amoloney)
- Ellen O’Carroll (Product Owner)
- Adam Saleh (asaleh)
- Patrik Polakovic
- With special shout-out to Stephen Smoogen that provided vital fixes even though he wasn’t officially part of the initiative
What the sub-team did in Q3 2021
Scripts that create the statistics for https://data-analysis.fedoraproject.org/ were cleaned up and refactored, making them stable enough, so that they don’t require more manual intervention.
The code on https://pagure.io/mirrors-countme/ now has tests running in CI and is packaged as an rpm to avoid further mishaps in package installation. The deployment scripts were cleaned-up as well, alongside the actual deployment on log01 machine, with it’s hard-to-track manual interventions for last minute bug-fixes replaced by ansible-scripts.
Cron-jobs that run the batch-jobs now only send notification emails on failures and to see the overall health of the batch-process you can see the simple dashboard on - https://monitor-dashboard-web-monitor-dashboard.app.os.fedoraproject.org/
Metrics for Apps on OpenShift
About
Goal of this initiative is to deploy OpenShift 4 in Fedora Infrastructure and start using Prometheus as a monitoring tool for apps deployed in OpenShift. This initiative should also define what metrics will be collected.
Issue trackers
Documentation
Members of sub-team for Q3 2021
- David Kirvan (Team Lead) (dkirwan)
- Aoife Moloney (Product Owner) (amoloney)
- Ellen O’Carroll (Product Owner)
- Vipul Siddharth (siddharthvipul1)
- Akashdeep Dhar (t0xic0der)
What the sub-team did in Q3 2021
- Infrastructure prep work to install Red Hat CoreOS on nodes for OpenShift Container Platform (OCP)
- Deployed OCP4.8 in staging and production
- Configuring cluster with OAuth, OpenShift Container Storage (OCS) and other important needed operators/configs to support Fedora workloads
- Automate the process of OCP deployment with Ansible
- Deployed and configure the User Workload Monitoring stack
- Investigate app migration from older cluster to new
Epilogue
If you get here, thank you for reading this. If you want to contact us, feel free to do it in #redhat-cpe channel on libera.chat.