Introducing Kurchu for CentOS SIG Content Collections

Tuesday, 6, May 2025 Neal Gompa announcement, Community, distro, SIG No Comments

(Note: This is cross-posted from the Velocity Limitless blog)

Anyone who builds a Linux distribution knows that it’s more than just the ISO you download from the website. When creating Linux distribution artifacts, a set of inputs; processing steps; and outputs need to be defined so that it is understood what comprises the “distribution” itself.

This is typically defined as a set of configuration definitions as input, gathering repositories and building images as processing steps, and pushing it to a unified tree as an output.

Kurchu is one of those tools aimed at doing this with Fedora/CentOS resources. If you're familiar with Pungi (the tool used for making official Fedora releases), this operates in a similar space.

Introducing Kurchu

The goal of Kurchu is to provide a straightforward and declarative way to create artifact collections (which are called "composes") to host and even redistribute. This is in contrast to Pungi, which uses a custom script configuration engine to programmatically define how to construct a collection of artifacts.

(If this seems somewhat familiar to you, it should be! There was a talk about this at CentOS Connect.)

Design Overview

Kurchu offers three main steps (or phases in Pungi terms):

Gather
Compile
Furnish

Each step is defined independently from each other, and Kurchu can run with only some of the steps defined. It will skip undefined steps.

Gather step

The "gather" step, well, gathers inputs for the later steps. Currently, this is designed around capturing repository content to store for future steps.

Crucially, Kurchu can gather composes produced by Pungi, Bodhi (the Fedora updates system and EPEL compose tool), and the On-Demand Compose Service (the system used to create CentOS/RHEL composes). In addition to gathering composes, it can also gather regular YUM repositories. This can be useful for incorporating add-on content into a complete content set.

This is what the “gather” step configuration looks like for CentOS Hyperscale 10:

[compose.gather]
# package gather step

sources = [
{ name = "hyperscale{release_version}s-packages-main-release", url = "https://cbs.centos.org/kojifiles/repos-dist/{name}/latest/", target = "hyperscale/{release_version}/packages-main", type = "kojidist", sync = true },
{ name = "hyperscale{release_version}s-packages-kernel-release", url = "https://cbs.centos.org/kojifiles/repos-dist/{name}/latest/", target = "hyperscale/{release_version}/packages-kernel", type = "kojidist", sync = true },
{ name = "hyperscale{release_version}s-packages-experimental-release", url = "https://cbs.centos.org/kojifiles/repos-dist/{name}/latest/", target = "hyperscale/{release_version}/packages-experimental", type = "kojidist", sync = true },
{ name = "hyperscale{release_version}s-packages-spin-release", url = "https://cbs.centos.org/kojifiles/repos-dist/{name}/latest/", target = "hyperscale/{release_version}/packages-spin", type = "kojidist", sync = true },
{ name = "extras{release_version}s-extras-common-release", url = "https://cbs.centos.org/kojifiles/repos-dist/{name}/latest/", target = "extras/{release_version}/common-release", type = "kojidist", sync = true },
{ name = "CentOS-Stream-{release_version}", url = "https://composes.stream.centos.org/stream-{release_version}/production/latest-CentOS-Stream/", target = "base/{release_version}/", type = "odcs", sync = true },
{ name = "Fedora-Epel-{release_version}", url = "https://kojipkgs.fedoraproject.org/compose/updates/epel{release_version}/", target = "fedora-epel/{release_version}/", type = "bodhi", sync = true },
]

From this configuration, we are able to capture the entirety of the Hyperscale package content that is used for installing a CentOS Hyperscale system.

Compile step

The "compile" step produces builds of operating system images. It orchestrates image builds remotely in a build system environment (such as Koji). At this time, it only supports working with kiwi in Koji, but others could be added as needed.

This is what the “compile” step configuration looks like for CentOS Hyperscale 10:

[compose.compile]
# image creation step
base_name = "CentOS-Stream-Hyperscale-Spin"
image_release = "0.n.{compose_date}"
target = "hyperscale/{release_version}/images/{base_name}-{variant}"

images = [
{ variant = "KDE-Desktop-Live", image_type = "iso", image_tool = "kiwi", buildsys = "cbs" },
{ variant = "OpenStack", image_type = "oem", image_tool = "kiwi", buildsys = "cbs" },
{ variant = "AWSEC2", image_type = "oem", image_tool = "kiwi", buildsys = "cbs" },
]

[compose.compile.image_tool.kiwi]
# Defines the base settings for kiwi
kiwi_description = { url = "https://gitlab.com/CentOS/Hyperscale/releng/kiwi-descriptions", path = "{base_name}.kiwi", vcs = "git", ref = "c{release_version}s" }

[compose.compile.buildsys.cbs]
# Defines the CBS build system
type = "koji"
koji_profile = "cbs"
koji_build_tag = "hyperscale{release_version}s-images-experimental-el{release_version}s"
koji_success_tags = ["hyperscale{release_version}s-images-experimental-release"]
gather_url = "https://cbs.centos.org/kojifiles/packages/{image_name}/{release_version}/{image_release}/"

From here, we’re able to enumerate what images should be produced and how they are defined and built. This also automatically releases the images once they are built.

Furnish step

The "furnish" step takes all the outputs from earlier steps and publishes it for external consumption. This may include compose metadata from Kurchu itself, which is produced in the form of a JSON file kurchu.artifacts.json.

This step supports publishing to a local file path (which is mandatory) and additionally uploading it to an AWS S3 bucket. Other upload targets could be added as needed.

This is what the “furnish” step configuration looks like for CentOS Hyperscale 10:

[compose.furnish]
# publish step
write_compose_info = true
upload_targets = [
{ name = "centos-sig-composes-us-east-1", target = "/composes", generate_indexhtml = true, public = false, type = "s3" },
]

This ensures we get the kurchu.artifacts.json file and have the whole collection uploaded to our S3 bucket. The final JSON will have data that includes information about the sources and images included, with the relative paths so that automation can consume it trivially (including generating yum repo files for configuration management or installation).

Current status

Kurchu status

All the basics for all steps in Kurchu are implemented. At this point, it is possible to create collections that have all the gathered content and metadata associated with it.

Kurchu supports defining the compose configuration through a TOML configuration file, but can also be directly interfaced through Python to support more custom workflows.

The existing implementation is in the process of being tested and documented for broader use.

The project is on GitLab, and is available as the kurchu package in Fedora Linux and Extra Packages for Enterprise Linux (EPEL) for CentOS/RHEL 10. Contributions from interested parties are welcome!

CentOS Hyperscale status

The first collections with all steps started being run in April 2025. The gather step takes about a day to complete, so now the gather and compile steps are run in parallel in separate Kurchu runs.

Since the production takes so long due to how much content there is to sync, the highest frequency we can do is weekly. The cadence we wish to run will need to be determined.

Initial CentOS Stream 9 and CentOS Stream 10 collections have been made for qualification:

CentOS Stream 10: https://hyperscale.sig.centos.org/composes/CentOS-Stream-Hyperscale-10-20250414_105233/
CentOS Stream 9: https://hyperscale.sig.centos.org/composes/CentOS-Stream-Hyperscale-9-20250414_105240/

The CentOS Hyperscale Kurchu configuration can be found on GitLab. A landing page now exists for all CentOS Stream Hyperscale content, including composes.

As Kurchu development was tied to the initial release of CentOS Hyperscale 10 to ensure real-world usage was accurately captured, a number of issues were identified in CentOS Stream and fixed along the way:

Missing Anaconda icon files for live images (RHEL-13713)
Inability to identify and fix issues in SELinux policy (RHEL-22960)
Anaconda live installer not functioning properly on Wayland (RHEL-67390)
Missing base-graphical composition group (RHEL-70766)
Broken ZRAM setup in the kernel (RHEL-72036)
erofs-utils was too old (RHEL-72588)
Geoclue was broken (RHEL-79063)
Broken xdg-user-dirs (RHEL-79119)
A handful of unshipped packages requested to be shipped

What’s next

As the implementation and the configuration format stabilizes, a schema will be created to represent the configuration file format for basic validation. This reduces the potential for erroneous inputs being accepted. This would be the requirement for a v1.0.0 release, in which inputs and interfaces are declared stable and follow semantic versioning. CentOS Hyperscale’s deployment needs to be finalized and automated.

Afterward

Prior to the creation of Kurchu, creating comprehensive collections of content as testable units pulling in from a variety of sources was very challenging. The effort of downloading all the repositories and organizing them in such a way that everything was coherently considered linked together for further consumption was very manual and brittle. Thanks to Meta for sponsoring the work and Velocity Limitless for implementing it and publishing a solution, it is now possible for anyone leveraging content from the CentOS community to develop and reasonably support integrated solutions built on the layers of content available for CentOS Stream.

If you’re interested in this topic and would like to contribute or collaborate on leveraging Kurchu, come by the CentOS Hyperscale SIG Matrix room on Fedora Chat and say hello!