[Nix-dev] [RFC] Declarative Virtual Machines

Sun Apr 23 20:40:22 CEST 2017

I did not do benchmarks, just noticed that boot of /-on-tmpfs +
/nix/store on 9pfs is slower. The performance was not critical.
Anyway, thank you for msize suggestion, I will try it.

There was more resentiment than a point :)
If I need to make something just a bit different from what an existing
tool has been designed for, I cannot reuse existing code.
That "a bit different", could be, for example, creating a NixOS .qcow2
on a remote Ubuntu server. I cannot use make-nix-disk, so I copy-paste
some code from it. It uses runInLinuxVM, which cannot be used on
Ubuntu as well, so code from runInLinuxVM is copied with some
modifications ( libguestfs cannot be used, because on its appliance
"switch-to-configuration" does not work). So it results in a new tool
of 500-1000 lines, made of copy-pasted and slightly modified snippets.
What you do is something similar - "same as nixos-containers, but for
qemu", which has some basic assumptions hardcoded, such as "shared nix
store" and "host is nixos too" and "VM is to run on the same machine
where it was built". The next guy, whose task would not fit with the
assumptions, ends up in creating another big tool which also creates
qcow2/vdi/raw/whatever and launch qemu/virtulbox/docker just in a bit
different way.
The point is the existing guest-creation-and-control tools are not
flexible enough, and this results in we have so many of them doing
very similar things and planning and making new ones (besides those
which are already in nixos and nixops, I have seen some other tools on
github, and I believe many of us have own).
Alhough all these tools are happy to use NixOS module system, they may
be happy as well to share and reuse something else: definition of
machines, of networks, a sofisticated tool to work with VM-images
(independent on runInLinuxVM), ...

On 4/23/17, Leo Gaspard <leo at gaspard.io> wrote:
> On 04/22/2017 11:07 PM, Volth wrote:
>> Hello.
>>
>> There are few objections against qemu with shared /nix/store:
>>
>> 1. It is fast to create but slow to run. Boot time with shared
>> /nix/store is about twice slow than with everything on qcow2.
>>
>> 2. 9P is unstable, every couple of months there is a new bug (real
>> bugs, not CVEs: wrong data read, the driver got stuck, etc)
>
> Hmm, I wasn't aware of these two first points (didn't test anything),
> intuitively virtfs was supposed to be faster in my mind, as it skipped
> one level of parsing the qcow2 image. Guess I should have actually
> tested instead of relying on gut feeling. That said, I'd like to ask
> whether you set the msize=262144 (or similarly high value) option for
> the 9p mount on the guest, during these benchmarks? It greatly
> influences performance when the 9p is not over network as it used to be
> designed for.
>
> As for stability, in my test setup I haven't hit any non-permanent issue
> (things like being unable to chown / in a mapped-file mode have
> appeared, though, but it's not anything that would compromise the
> stability of a production system as it can be seen during development),
> so I assumed it was pretty stable.
>
>> 3. host GC cannot see the runtime roots inside the VM, so all the
>> guest system closures from its last boot should be preserved from host
>> GC. It may be tricky to debug.
>
> This is not really an issue, as the store is not shared with the guest,
> but rather a rsync of the part of the store that interests the guest (in
> order to avoid information leaks). So the guest never actually sees the
> host store.
>
> The reason for picking 9p instead of qcow2 to hold this copy of the
> store was to allow upgrades to the VM without rebooting it (as the VM
> doesn't have access to its configuration it can't just perform the
> upgrade from the inside), so I thought that future work may include the
> host rsync'ing the relevant files into the 9p export path, and then just
> push a bash script at a shared place the guest would have a cron to
> execute as root, that would trigger a call to the new profile's
> switch-to-configuration.
>
> This would also be possible with the store on a qcow2 image, but would
> entail also pushing all the store paths through this shared path and
> having the guest copy it to its nix-store. I guess it's possible and
> doesn't involve many drawbacks, except a time-to-upgrade quite increased
> due to the two copies instead of one.
>
> In the downsides of using qcow2, I can see that if using a CoW FS (such
> as btrfs) shared between /nix/store and /var/lib/vm/${vmname}/store,
> it's possible to have the store of the guest take 0 additional space,
> while using a qcow2 image makes it much harder (and I don't think any
> widely used FS performs block-level deduplication, but I may be wrong)
>
> So I'd love to hear other voices before switching from one to the other,
> as I'm pretty sure we're missing some other decision points.
>
>> Also, the whole idea could be splited to simpler building blocks and
>> generalized to use with Virtualbox and different kind of containers.
>> One of the block could be, say, "nix-slave" - the NixOS install which
>> is always configured on an external machine and then run inside VM or
>> container or deployed to the cloud.
>> So it cannot do "nixos-rebuild" from inside and has limited set of
>> features, no profiles (no need to "boot previous version" if the
>> previous version could be written to the .qcow2 of a powered-off VM),
>> no "nix-env", etc
>> Then, a tool to make container/VM out of configuation.
>> Then, a VM-agnistic tool to configure network of that slaves.
>>
>> Well, it sounds very familiar.
>> We indeed have this pattern in so many places: NixOS containers,
>> NixOps, test-driver, "nixos-install build-vm", runInLinuxVM,
>> make-disk-image.nix, your proposal, etc
>> Each of them solves one narrow task and the code is not reuseful. For
>> example, when I need to create .qcow2 outside the nix store, or
>> install/repair nixos on exising .qcow2, I end up writing by own set of
>> tools (or using RedHat's libguestfs, which is... another VM appliance)
>> Perhaps, there could be some common ground which unifies that kind of
>> tasks as an alternative to creating new bloated tools with many
>> options?
>
> I see you have already seen it, but just for the record, copumpkin has
> recently done great work in this domain, with nixos-prepare-root [1]
> (it's newly merged, so I didn't use it in my not-yet-PR'd changes, but
> it's on my todo-list before opening the PR related to this RFC)
>
> This looks like exactly what you're looking for, except that it still
> requires to copy the generated root from a local directory to the right
> block device, which can anyway be done only in a way that heavily
> depends on which block device it is. It would be possible to do a
> make-disk-image as you talked about in comments to [2], but I don't
> think it would fit inside the scope of this RFC, rather in a nixpkgs
> refactoring (which AFAIU doesn't require a RFC).
>
> Or did I miss your point here?
>
>
> [1] https://github.com/NixOS/nixpkgs/pull/23026
>
> [2] https://github.com/NixOS/nixpkgs/pull/24964
>
>