Research and Scientific Publications

Nix started as a research project by Eelco Dolstra and his collaborators at Utrecht University around 2003. Since then, scientists from multiple institutions have published their work on repeatable computation and reliable, secure software distribution.

This collection traces the continued exploration of theoretical foundations and practical applications of the ideas underlying Nix and its ecosystem.

Scientific publications about Nix and associated projects

2024

Extending Cloud Build Systems to Eliminate Transitive Trust

Martin Schwaighofer , Michael Roland , René Mayrhofer

Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses, Salt Lake City, Utah, USA

Trusting the output of a build process requires trusting the build process itself, and the build process of all inputs to that process, and so on. Cloud build systems, like Nix or Bazel, allow their users to precisely specify the build steps making up the intended software supply chain, build the desired outputs as specified, and on this basis delegate build steps to other builders or fill shared caches with their outputs. Delegating build steps or consuming artifacts from shared caches, however, requires trusting the executing builders, which makes cloud build systems better suited for centrally managed deployments than for use across distributed ecosystems. We propose two key extensions to make cloud build systems better suited for use in distributed ecosystems. Our approach attaches metadata to the existing cryptographically secured data structures and protocols, which already link build inputs and outputs for the purpose of caching. Firstly, we include builder provenance data, recording which builder executed the build, its software stack, and a remote attestation, making this information verifiable. Secondly, we include a record of the outcome of how the builder resolved each dependency. Together, these two measures eliminate transitive trust in software dependencies, by enabling users to perform verification of transitive dependencies independently, and against their own criteria, at time of use. Finally, we explain how our proposed extensions could theoretically be implemented in Nix in the future.

Source Code Archiving to the Rescue of Reproducible Deployment

Ludovic Courtès , Timothy Sample , Stefano Zacchiroli , Simon Tournier

2nd ACM Conference on Reproducibility and Replicability (ACM REP '24), Rennes, France

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.

We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.

Reproducibility in Software Engineering

Pol Dellaiera

Master's thesis, University of Mons, Mons, Belgium

The concept of reproducibility has long been a cornerstone in scientific research, ensuring that results are robust, repeatable, and can be independently verified. This concept has been extended to computer science, focusing on the ability to recreate identical software artefacts. However, the importance of reproducibility in software engineering is often overlooked, leading to challenges in the validation, security, and reliability of software products.

This master's thesis aims to investigate the current state of reproducibility in software engineering, exploring both the barriers and potential solutions to making software more reproducible and raising awareness. It identifies key factors that impede reproducibility such as inconsistent environments, lack of standardisation, and incomplete documentation. To tackle these issues, I propose an empirical comparison of tools facilitating software reproducibility.

To provide a comprehensive assessment of reproducibility in software engineering, this study adopts a methodology that involves a hands-on evaluation of four different methods and tools. Through a systematic evaluation of these tools, this research seeks to determine their effectiveness in establishing and maintaining identical software environments and builds.

This study contributes to academic knowledge and offers practical insights that could influence future software development protocols and standards.

Reproducibility of Build Environments through Space and Time

Julien Malka , Stefano Zacchiroli , Théo Zimmermann

ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results, Lisbon, Portugal

Modern software engineering builds up on the composability of software components, that rely on more and more direct and transitive dependencies to build their functionalities. This principle of reusability however makes it harder to reproduce projects' build environments, even though reproducibility of build environments is essential for collaboration, maintenance and component lifetime. In this work, we argue that functional package managers provide the tooling to make build environments reproducible in space and time, and we produce a preliminary evaluation to justify this claim. Using historical data, we show that we are able to reproduce build environments of about 7 million Nix packages, and to rebuild 99.94% of the 14 thousand packages from a 6-year-old Nixpkgs revision.

Secure Nix Expression Updates

Finn Landweber

Bachelor's thesis, Leipzig University of Applied Sciences, Leipzig, Germany

Projects and individual users often struggle to keep track of their deployed software and update vulnerable versions quickly. Nix, an increasingly popular package manager, provides a rigorous approach to dependency management and transparency and could be used to improve this situation significantly. However, updates to its build instructions are not cryptographically secured and thus give way to machine-in-the-middle attacks. This thesis demonstrates that instruction updates can be protected from these kinds of attacks with minimal changes to the update mechanism. It takes Git as the basis for distributing versioned and signed Nix code and explores multiple ways in which the origins of downloaded instructions can be verified automatically. The work contributes a structured analysis of Nix instruction update security based in literature. From there, it derives novel interfaces from Nix to two preexisting Git signature verification solutions and presents a new tool tailored to the needs of the Nix ecosystem. Although not all attacks on Nix expression updates can be mitigated by the suggested tools, they can provide a practical security gain for Nix users. The findings may help conceptualize a path towards a higher security standard for Nix deployments.

Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management

Julien Malka

IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, Lisbon, Portugal

Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FMPs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain.

2022

Toward practical transparent verifiable and long-term reproducible research using Guix

Nicolas Vallet , David Michonneau , Simon Tournier

Scientific Data, vol. 9, no. 597

Reproducibility crisis urge scientists to promote transparency which allows peers to draw same conclusions after performing identical steps from hypothesis to results. Growing resources are developed to open the access to methods, data and source codes. Still, the computational environment, an interface between data and source code running analyses, is not addressed. Environments are usually described with software and library names associated with version labels or provided as an opaque container image. This is not enough to describe the complexity of the dependencies on which they rely to operate on. We describe this issue and illustrate how open tools like Guix can be used by any scientist to share their environment and allow peers to reproduce it. Some steps of research might not be fully reproducible, but at least, transparency for computation is technically addressable. These tools should be considered by scientists willing to promote transparency and open science.

2020

Build systems à la carte: Theory and practice

Andrey Mokhov , Neil Mitchell , Simon Peyton Jones

Journal of Functional Programming, vol. 30

Build systems are awesome, terrifying – and unloved. They are used by every developer around the world, but are rarely the object of study. In this paper, we offer a systematic, and executable, framework for developing and comparing build systems, viewing them as related points in a landscape rather than as isolated phenomena. By teasing apart existing build systems, we can recombine their components, allowing us to prototype new build systems with desired properties.

2018

Build systems à la carte

Andrey Mokhov , Neil Mitchell , Simon Peyton Jones

International Conference on Functional Programming 2018 (ACM ICFP '18), St. Louis, Missouri, USA

Build systems are awesome, terrifying -- and unloved. They are used by every developer around the world, but are rarely the object of study. In this paper we offer a systematic, and executable, framework for developing and comparing build systems, viewing them as related points in landscape rather than as isolated phenomena. By teasing apart existing build systems, we can recombine their components, allowing us to prototype new build systems with desired properties.

2011

Multi-Platform Software Package Management

Joachim Schiele

Diplomarbeit thesis, University of Tübingen, Tübingen, Germany

Today’s package management is a complex field. This diploma thesis analyses different package managementsystems in several regards: how the packaging is done; how the user interacts with the package manager; Apt(Debian Linux), Portage (Gentoo Linux) and Nix (Nix OS) are compared in great detail.To get practical results, the author used the evopedia application, an open source offline Wikipedia reader, toexperiment with several different package managers. This diploma thesis also tries to answer why there areno complex package managers for Microsoft Windows. Most package managers use their own terms which cannot be generalized to other package managers. This problem was solved by defining meaningful words whichdescribe certain properties of a package manager.

2010

Automating System Tests Using Declarative Virtual Machines

Sander van der Burg , Eelco Dolstra

21st IEEE International Symposium on Software Reliability Engineering (ISSRE 2010), San Jose, California, USA

Automated regression test suites are an essential software engineering practice: they provide developers with rapid feedback on the impact of changes to a system's source code. The inclusion of a test case in an automated test suite requires that the system's build process can automatically provide all the environmental dependencies of the test. These are external elements necessary for a test to succeed, such as shared libraries, running programs, and so on. For some tests (e.g., a compiler's), these requirements are simple to meet.

However, many kinds of tests, especially at the integration or system level, have complex dependencies that are hard to provide automatically, such as running database servers, administrative privileges, services on external machines or specific network topologies. As such dependencies make tests difficult to script, they are often only performed manually, if at all. This particularly affects testing of distributed systems and system-level software.

This paper shows how we can automatically instantiate the complex environments necessary for tests by creating (networks of) virtual machines on the fly from declarative specifications. Building on NixOS, a Linux distribution with a declarative configuration model, these specifications concisely model the required environmental dependencies. We also describe techniques that allow efficient instantiation of VMs. As a result, complex system tests become as easy to specify and execute as unit tests. We evaluate our approach using a number of representative problems, including automated regression testing of a Linux distribution.

NixOS: A Purely Functional Linux Distribution

Eelco Dolstra , Andres Löh , Nicolas Pierron

Journal of Functional Programming, vol. 20, no. 5-6

Existing package and system configuration management tools suffer from an imperative model, where system administration actions such as upgrading packages or changes to system configuration files are stateful: they destructively update the state of the system. This leads to many problems, such as the inability to roll back changes easily, to deploy multiple versions of a package side-by-side, to reproduce a configuration deterministically on another machine, or to reliably upgrade a system. In this article we show that we can overcome these problems by moving to a purely functional system configuration model. This means that all static parts of a system (such as software packages, configuration files and system startup scripts) are built by pure functions and are immutable, stored in a way analogously to a heap in a purely functional language. We have implemented this model in NixOS, a non-trivial Linux distribution that uses the Nix package manager to build the entire system configuration from a modular, purely functional specification.

2009

Software Deployment in a Dynamic Cloud: From Device to Service Orientation in a Hospital Environment

Sander van der Burg , Eelco Dolstra , Merijn de Jonge , Eelco Visser

2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, Vancouver, British Columbia, Canada

Hospital environments are currently primarily device-oriented: software services are installed, often manually, on specific devices. For instance, an application to view MRI scans may only be available on a limited number of workstations. The medical world is changing to a service-oriented environment, which means that every software service should be available on every device. However, these devices have widely varying capabilities, ranging from powerful workstations to PDAs, and high-bandwidth local machines to low-bandwidth remote machines. To support running applications in such an environment, we need to treat the hospital machines as a cloud, where components of the application are automatically deployed to machines in the cloud with the required capabilities and connectivity. In this paper, we suggest an architecture for applications in such a cloud, in which components are reliably and automatically deployed on the basis of a declarative model of the application using the Nix package manager.

2008

Atomic Upgrading of Distributed Systems

Sander van der Burg , Eelco Dolstra , Merijn de Jonge

1st International Workshop on Hot Topics in Software Upgrade (HotSWUp), Nashville, Tennessee, USA

Upgrading distributed systems is a complex process. It requires installing the right services on the right machines, configuring them correctly, and so on, which is error-prone and tedious. Moreover, since services in a distributed system depend on each other and are updated separately, upgrades typically are not atomic: there is a time window during which some but not all services are updated, and a new version of one service might temporarily talk to an old version of another service. Previously we implemented the Nix package management system, which allows atomic upgrades and rollbacks on single systems. In this paper we show an extension to Nix that enables the deployment of distributed systems on the basis of a declarative deployment model, and supports atomic upgrades of such systems.

NixOS: A Purely Functional Linux Distribution

Eelco Dolstra , Andres Löh

ICFP 2008: 13th ACM SIGPLAN International Conference on Functional Programming, Victoria, British Columbia, Canada

Existing package and system configuration management tools suffer from an imperative model, where system administration actions such as upgrading packages or changes to system configuration files are stateful: they destructively update the state of the system. This leads to many problems, such as the inability to roll back changes easily, to run multiple versions of a package side-by-side, to reproduce a configuration deterministically on another machine, or to reliably upgrade a system. In this paper we show that we can overcome these problems by moving to a purely functional system configuration model. This means that all static parts of a system (such as software packages, configuration files and system startup scripts) are built by pure functions and are immutable, stored in a way analogously to a heap in a purely function language. We have implemented this model in NixOS, a non-trivial Linux distribution that uses the Nix package manager to build the entire system configuration from a purely functional specification.

The Nix Build Farm: A Declarative Approach to Continuous Integration

Eelco Dolstra , Eelco Visser

International Workshop on Advanced Software Development Tools and Techniques (WASDeTT 2008), Paphos, Cyprus

There are many tools to support continuous integration (the process of automatically and continuously building a project from a version management repository). However, they do not have good support for variability in the build environment: dependencies such as compilers, libraries or testing tools must typically be installed manually on all machines on which automated builds are performed. The Nix package manager solves this problem: it has a purely functional language for describing package build actions and their dependencies, allowing the build environment for projects to be produced automatically and deterministically. We have used Nix to build a continuous integration tool, the Nix build farm, that is in use to continuously build and release a large set of projects.

Maximal Laziness — An Efficient Interpretation Technique for Purely Functional DSLs

Eelco Dolstra

Eighth Workshop on Language Descriptions, Tools and Applications (LDTA 2008), Budapest, Hungary

In lazy functional languages, any variable is evaluated at most once. This paper proposes the notion of maximal laziness, in which syntactically equal terms are evaluated at most once: if two terms e1 and e2 arising during the evaluation of a program have the same abstract syntax representation, then only one will be evaluated, while the other will reuse the former's evaluation result. Maximal laziness can be implemented easily in interpreters for purely functional languages based on term rewriting systems that have the property of maximal sharing — if two terms are equal, they have the same address. It makes it easier to write interpreters, as techniques such as closure updating, which would otherwise be required for efficienccy, are not needed. Instead, a straight-forward translation of call-by-name semantic rules yields a call-by-need interpreter, reducing the gap between the language specification and its implementation. Moreover, maximal laziness obviates the need for optimisations such as memoisation and let-floating.

2007

Purely Functional System Configuration Management

Eelco Dolstra , Armijn Hemel

11th Workshop on Hot Topics in Operating Systems (HotOS XI), San Diego, California, USA

System configuration management is difficult because systems evolve in an undisciplined way: packages are upgraded, configuration files are edited, and so on. The management of existing operating systems is strongly imperative in nature, since software packages and configuration data (e.g., /bin and /etc in Unix) can be seen as imperative data structures: they are updated in-place by system administration actions. In this paper we present an alternative approach to system configuration management: a purely functional method, analogous to languages like Haskell. In this approach, the static parts of a configuration — software packages, configuration files, control scripts — are built from pure functions, i.e., the results depend solely on the specified inputs of the function and are immutable. As a result, realising a system configuration becomes deterministic and reproducible. Upgrading to a new configuration is mostly atomic and doesn't overwrite anything of the old configuration, thus enabling rollbacks. We have implemented the purely functional model in a small but realistic Linux-based operating system distribution called NixOS.

2006

NixOS: the Nix based operating system

Armijn Hemel

Master's thesis, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands

The subject of this thesis is how the Nix package management system can be applied to manage a whole Linux distribution. Many conventional package management systems have drawbacks that Nix fixes. But, Nix has never been used to deploy and manage a whole system.

In this thesis a proof of concept Linux distribution called NixOS is described. NixOS uses the Nix package management system to manage all software that is installed on the system, including the Linux kernel, all software and system services.

Using Nix to manage all software on a system, as is done on NixOS, has several advantages. Developers don't need to be worried that unwanted dependencies are picked up during the build of a software package, since these are completely eliminated. System administrators get the possibility to deploy services using Nix and how they can immediately use all benefits from Nix, including atomic upgrades and rollbacks, without going through a painful cycle of rolling back a service, with all its, possibly also updated, dependencies.

This thesis describes the implementation NixOS, including pitfalls that were encountered and choices that were made. Also shown are some concrete results of running NixOS and how NixOS can be bettered.

The Purely Functional Software Deployment Model

Eelco Dolstra

PhD thesis, Faculty of Science, Utrecht University, Utrecht, The Netherlands

Software deployment is the set of activities related to getting software components to work on the machines of end users. It includes activities such as installation, upgrading, uninstallation, and so on. Many tools have been developed to support deployment, but they all have serious limitations with respect to correctness. For instance, the installation of a component can lead to the failure of previously installed components; a component might require other components that are not present; and it is generally difficult to undo deployment actions. The fundamental causes of these problems are a lack of isolation between components, the difficulty in identifying the dependencies between components, and incompatibilities between versions and variants of components.

This thesis describes a better approach based on a purely functional deployment model, implemented in a deployment system called Nix. Components are stored in isolation from each other in a Nix store. Each component has a name that contains a cryptographic hash of all inputs that contributed to its build process, and the content of a component never changes after it has been built. Hence the model is purely functional.

This storage scheme provides several important advantages. First, it ensures isolation between components: if two components differ in any way, they will be stored in different locations and will not overwrite each other. Second, it allows us to identify component dependencies. Undeclared build time dependencies are prevented due to the absence of “global” component directories used in other deployment systems. Runtime dependencies can be found by scanning for cryptographic hashes in the binary contents of components, a technique analogous to conservative garbage collection in programming language implementation. Since dependency information is complete, complete deployment can be performed by copying closures of components under the dependency relation.

Developers and users are not confronted with components' cryptographic hashes directly. Components are built automatically from Nix expressions, which describe how to build and compose arbitrary software components; hashes are computed as part of this process. Components are automatically made available to users through “user environments”, which are synthesised sets of activated components. User environments enable atomic upgrades and rollbacks, as well as different sets of activated components for different users.

Nix expressions provide a source-based deployment model. However, source-based deployment can be transparently optimised into binary deployment by making pre-built binaries (keyed on their cryptographic hashes) available in a shared location such as a network server. This is referred to as transparent source/binary deployment.

The purely functional deployment model has been validated by applying it to the deployment of more than 278 existing Unix packages. In addition, this thesis shows that the model can be applied naturally to the related activities of continuous integration using build farms, service deployment and build management.

2005

Secure Sharing Between Untrusted Users in a Transparent Source/Binary Deployment Model

Eelco Dolstra

20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), Long Beach, California, USA

The Nix software deployment system is based on the paradigm of transparent source/binary deployment: distributors deploy descriptors that build components from source, while client machines can transparently optimise such source builds by downloading pre-built binaries from remote repositories. This model combines the simplicity and flexibility of source deployment with the efficiency of binary deployment. A desirable property is sharing of components: if multiple users install from the same source descriptors, ideally only one remotely built binary should be installed. The problem is that users must trust that remotely downloaded binaries were built from the sources they are claimed to have been built from, while users in general do not have a trust relation with each other or with the same remote repositories.

This paper presents three models that enable sharing: the extensional model that requires that all users on a system have the same remote trust relations, the intensional model that does not have this requirement but may be suboptimal in terms of space use, and the mixed model that merges the best properties of both. The latter two models are achieved through a novel technique of {em hash rewriting} in content-addressable component stores, and were implemented in the context of the Nix system.

Service Configuration Management

Eelco Dolstra , Martin Bravenboer , Eelco Visser

12th International Workshop on Software Configuration Management (SCM-12), Long Beach, California, USA

The deployment of services — sets of running programs that provide some useful facility on a system or network — is typically implemented through a manual, time-consuming and error-prone process. For instance, system administrators must deploy the necessary software components, edit configuration files, start or stop processes, and so on. This is often done in an ad hoc style with no reproducibility, violating proper configuration management practices. In this paper we show that build management, software deployment and service deployment can be integrated into a single formalism. We do this in the context of the Nix software deployment system, and show that its advantages — co-existence of versions and variants, atomic upgrades and rollbacks, and component closure — extend naturally to service deployment. The approach also elegantly extends to distributed services. In addition, we show that the Nix expression language can simplify the implementation of crosscutting variation points in services.

Efficient Upgrading in a Purely Functional Component Deployment Model

Eelco Dolstra

Eighth International SIGSOFT Symposium on Component-based Software Engineering (CBSE 2005), St. Louis, Missouri, USA

Safe and efficient deployment of software components is an important aspect of CBSE. The Nix deployment system enables side-by-side deployment of different versions and variants of components, complete installation, safe upgrades, and safe uninstalls through garbage collection. It accomplishes this through a purely functional deployment model, meaning that the file system content of a component only depends on the inputs used to build it, and never changes afterwards. An apparent downside to this model is that upgrading "fundamental" components used as build inputs by many other components becomes expensive, since all of these must be rebuilt and redeployed. In this paper we show that binary patching between sets of components enables efficient deployment of upgrades in the purely functional model, transparently to users. Sequences of patches can be combined automatically to enable upgrading between arbitrary versions. The approach was empirically validated.

2004

Nix: A Safe and Policy-Free System for Software Deployment

Eelco Dolstra , Merijn de Jonge , Eelco Visser

18th Large Installation System Administration Conference (LISA '04), Atlanta, Georgia, USA

Existing systems for software deployment are neither safe nor sufficiently flexible. Primary safety issues are the inability to enforce reliable specification of component dependencies, and the lack of support for multiple versions or variants of a component. This renders deployment operations such as upgrading or deleting components dangerous and unpredictable. A deployment system must also be flexible (i.e., policy-free) enough to support both centralised and local package management, and to allow a variety of mechanisms for transferring components. In this paper we present Nix, a deployment system that addresses these issues through a simple technique of using cryptographic hashes to compute unique paths for component instances.

Imposing a Memory Management Discipline on Software Deployment

Eelco Dolstra , Eelco Visser , Merijn de Jonge

26th International Conference on Software Engineering (ICSE 2004), Edinburgh, Scotland

The deployment of software components frequently fails because dependencies on other components are not declared explicitly or are declared imprecisely. This results in an incomplete reproduction of the environment necessary for proper operation, or in interference between incompatible variants. In this paper we show that these deployment hazards are similar to pointer hazards in memory models of programming languages and can be countered by imposing a memory management discipline on software deployment. Based on this analysis we have developed a generic, platform and language independent, discipline for deployment that allows precise dependency verification; exact identification of component variants; computation of complete closures containing all components on which a component depends; maximal sharing of components between such closures; and concurrent installation of revisions and variants of components. We have implemented the approach in the Nix deployment system, and used it for the deployment of a large number of existing Linux packages. We compare its effectiveness to other deployment systems.

2003

Integrating Software Construction and Software Deployment

Eelco Dolstra

11th International Workshop on Software Configuration Management (SCM-11), Portland, Oregon, USA

Classically, software deployment is a process consisting of building the software, packaging it for distribution, and installing it at the target site. This approach has two problems. First, a package must be annotated with dependency information and other meta-data. This to some extent overlaps with component dependencies used in the build process. Second, the same source system can often be built into an often very large number of variants. The distributor must decide which element(s) of the variant space will be packaged, reducing the flexibility for the receiver of the package. In this paper we show how building and deployment can be integrated into a single formalism. We describe a build manager called Maak that can handle deployment through a sufficiently general module system. Through the sharing of generated files, a source distribution transparently turns into a binary distribution, removing the dichotomy between these two modes of deployment. In addition, the creation and deployment of variants becomes easy through the use of a simple functional language as the build formalism.