Skip to main content

The life cycle of a package

conda-forge implements a particular workflow for building, publishing and maintaining conda packages. However, the core concepts are the same for any conda packaging solution.

General concepts about conda packaging

conda packages are built off conda recipes, which consist of a metadata file (like meta.yaml or recipe.yaml), and optionally supporting scripts and data. A build tool (usually conda-build or rattler-build) takes the recipe and produces one or more packages (also referred to as outputs and/or artifacts, depending on the context).

While you can distribute the artifacts on your own, the conda packages are usually uploaded to a conda channel hosted in a server like Anaconda.org. This channel server processes all uploaded packages and aggregates the metadata contained in the packages in a single repodata.json file per platform or subdir. For example, this is a subset of the conda-forge repodata for Linux x64 systems: current_repodata.json.

These are the metadata files that the conda client fetches when the user types conda install ... or similar commands. The solver will process all the metadata and will provide the most adequate selection of packages to the user, which are then downloaded, extracted and linked into the target conda environment.

Post-publication particularities

For most packages, the paragraphs above are enough to describe their life cycle. However, the repodata-first approach followed in the conda ecosystem allows for some unique features in the post-publication stages.

For large volume channels like conda-forge, Anaconda.org delivers the artifacts through a CDN network for faster access. The CDN network is synced with the channels periodically. As a result, packages take around 15 minutes to be available for installation after their publication.

This repodata-first approach offers a unique opportunity to post-process the repodata files. This way, we can fix metadata issues without rebuilding packages. Note that these changes do not propagate back to the metadata contained in the packages.

Anaconda.org also offers the concept of channel labels, which in practice behave as subchannels. The default label is main. When a new label is added, the package is also visible in the subchannel <channel>/label/<label>. For example, a package uploaded to conda-forge and labeled as test and main will be available in the conda-forge channel, but also in the conda-forge/label/test subchannel.

The life cycle on conda-forge

Anyone can run conda-build on their computer and manually upload their packages to Anaconda.org. However, that approach has a few problems:

  • It doesn't facilitate collaboration.
  • There's no transparency in the process.
  • Reproducibility is system-dependent.
  • Compatibility across packages is not guaranteed.
  • It doesn't scale well beyond a handful of packages.

On conda-forge, most packages are built using public CI services and maintained by thousands of volunteers, which requires approaching the problem in a different way to guarantee fine-grained control of permissions, independent project management, and automated batch updates.

The main idea is that each conda recipe is processed by a separate GitHub repository. These repositories, named feedstocks in conda-forge, host the user-contributed conda recipe plus several auto-generated required scripts, configuration files, and CI pipelines to build and export the conda artifacts. Under this setup, the conda-forge bots can traverse the conda-forge repositories to re-generate and update feedstocks as needed when a global change or fix needs to be issued.

To be given a conda-forge feedstock, contributors must first submit their recipe for review to the staged-recipes repository. Once reviewed and approved, the PR is merged to main, which triggers the feedstock creation.

After accepting the invitations to the conda-forge organization, the submitting contributor(s) will have been given commit rights to that repository. By then, the feedstock creation machinery will have registered the new repository with the required CI services and populated its contents with the submitted recipe, plus the supporting scripts, configuration files and CI pipelines.

These pipelines will process the initial commits to produce and upload the conda artifacts to the cf-staging channel. Any subsequent pushes to main (e.g. merged PRs) or other enabled branches will undergo the same process.

info

For existing feedstocks, conda-forge bots will often send automated PRs for new project releases or maintenance tasks. You can find more details about it in Automation & bots.

The validation server will detect the new uploads on cf-staging and will perform some checks on those artifacts. If successful, the artifacts will then be copied to the actual conda-forge channel.

At this point, the channel server will process the contents of the new packages to retrieve their metadata and update the repodata files. On the next CDN sync cycle, the artifacts will be distributed to the delivery network for faster access. Validation and CDN sync usually take around 15 minutes after the CI is passing on main. From this moment on, users can install the new packages from the CLI.

Post-publication particularities

In an ideal world, that would be the end of the life cycle. However, in some cases, some packages go through some post-publication stages.

If the package metadata is found to be wrong or out-of-date, it can be modified without rebuilding the package. The channel server can apply patches to the repodata files directly, via instructions published in conda-forge-repodata-patches that are processed weekly.

Sometimes, there are issues with a published package that cannot be amended with a repodata patch (e.g. libraries were built wrong and segfault). In these cases, the packages can be retired by labeling them as broken. This is done through the admin-requests repository. As part of the CDN-driven metadata patching, packages labeled as broken are not included in the final repodata index. However, they are still available via direct URL access. This allows organizations to retire packages from normal, solver-driven installs without compromising the reproducibility offered by lock files.

Finally, a project might have reached a status where no further updates are needed or expected (e.g. it has been superseded by a new project). If the maintainers want to, these feedstocks can be archived and marked as read-only, which is also done via admin-requests.

Summary of stages

These stages are key concepts in the conda-forge documentation. Feel free to refer to this list at any time as you check the rest of the documentation.

  1. Initial submission to staged-recipes
  2. Feedstock changes:
    • A. Repository initialization
    • B. Automated maintenance updates
    • C. PRs submitted by users
  3. Package building
  4. Package validation
  5. Package publication
  6. Post-publication:
    • A. Repodata patch
    • B. Mark a package as broken
    • C. Archive the feedstock
info

If you want to read more about the infrastructure details behind these stages, consider reading our Infrastructure guide.