Build Reuse in nixbuild.net

Posted on August 13, 2020 by Rickard Nilsson

Performance and cost-effectiveness are core values for nixbuild.net. How do you make a Nix build as performant and cheap as possible? The answer is — by not running it at all!

This post goes into some detail about the different ways nixbuild.net is able to safely reuse build results. The post gets technical, but the main message is that nixbuild.net really tries to avoid building if it can, in order to save time and money for its users.

Binary Caches

The most obvious way of reusing build results is by utilising binary caches, and an earlier blog post described how this is supported by nixbuild.net. In short, if something has been built on cache.nixos.org, nixbuild.net can skip building it and just fetch it. It is also possible to configure other binary caches to use, and even treat the builds of specific nixbuild.net users in the same way as a trusted binary cache.

No Shared Uploads

As part of the Nix remote build protocol, inputs (dependencies) can be uploaded directly to nixbuild.net. Those inputs are not necessarily trustworty, because we don’t know how they were produced. Therefore, those inputs are only allowed to be used by the user who uploaded them. The exception is if the uploaded input had a signature from a binary cache key, then we allow it to be used by all accounts that trust that specific key. Also, if explicit trust has been setup between two accounts, uploaded paths will be shared.

Derivation Sharing

Another method of reuse, unique to nixbuild.net, is the sharing of build results between users that don’t necessarily trust each other. It works like this:

  1. When we receive a build request, we get a derivation from the user’s Nix client. In essence, this derivation describes what inputs (dependencies) the build needs, and what commands must be run to produce the build output.

  2. The inputs are described in the derivation simply as a list of store paths (/nix/store/abc, /nix/store/xyz). The way the Nix remote build protocol works, those store paths have already been provided to us, either because we already had trusted variants of them in our storage, or because we’ve downloaded them from binary caches, or because the client uploaded them to us.

    In order for us to be able to run the build, we need to map the input store paths to the actual file contents of the inputs. This mapping can actually vary even though store paths are the same. This is because a Nix store path does not depend on the contents of the path, but rather on the dependencies of the path. So we can very well have multiple versions of the same store path in our storage, because multiple users might have uploaded differing builds of the same paths.

    Anyhow, we will end up with a mapping that depends entirely on what paths the user is allowed to use. So, two users may build the exact same derivation but get different store-path-to-content mappings.

  3. At this stage, we store a representation of both the derivation itself, and the mapping described in previous step. Together, these two pieces represent a unique derivation in nixbuild.net’s database.

  4. Now, we can build the derivation. The build runs inside an isolated, virtualized sandbox that has no network access and nothing other than its inputs inside its filesystem.

    The sandbox is of course vital for keeping your builds secure, but it has another application, too: If we already have built a specific derivation (with a specific set of input content), this build result can be reused for any user that comes along and requests a build of the exact same derivation with the exact same set of input content.

We do not yet have any numbers on how big impact this type of build result sharing has in practice. The effectiveness will depend on how reproducible the builds are, and of course also on how many users that are likely to build the same derivations.

For an organization with a large set of custom packages that want to share binary builds with contributors and users, it could turn out useful. The benefit for users is that they don’t actually have to blindly trust a binary cache but instead can be sure that they get binaries that correspond to the nix derivations they have evaluated.