[Nix-dev] [RFC] Declarative Virtual Machines

Sun Apr 23 14:40:42 CEST 2017

On 04/22/2017 11:07 PM, Volth wrote:
> Hello.
> 
> There are few objections against qemu with shared /nix/store:
> 
> 1. It is fast to create but slow to run. Boot time with shared
> /nix/store is about twice slow than with everything on qcow2.
> 
> 2. 9P is unstable, every couple of months there is a new bug (real
> bugs, not CVEs: wrong data read, the driver got stuck, etc)

Hmm, I wasn't aware of these two first points (didn't test anything),
intuitively virtfs was supposed to be faster in my mind, as it skipped
one level of parsing the qcow2 image. Guess I should have actually
tested instead of relying on gut feeling. That said, I'd like to ask
whether you set the msize=262144 (or similarly high value) option for
the 9p mount on the guest, during these benchmarks? It greatly
influences performance when the 9p is not over network as it used to be
designed for.

As for stability, in my test setup I haven't hit any non-permanent issue
(things like being unable to chown / in a mapped-file mode have
appeared, though, but it's not anything that would compromise the
stability of a production system as it can be seen during development),
so I assumed it was pretty stable.

> 3. host GC cannot see the runtime roots inside the VM, so all the
> guest system closures from its last boot should be preserved from host
> GC. It may be tricky to debug.

This is not really an issue, as the store is not shared with the guest,
but rather a rsync of the part of the store that interests the guest (in
order to avoid information leaks). So the guest never actually sees the
host store.

The reason for picking 9p instead of qcow2 to hold this copy of the
store was to allow upgrades to the VM without rebooting it (as the VM
doesn't have access to its configuration it can't just perform the
upgrade from the inside), so I thought that future work may include the
host rsync'ing the relevant files into the 9p export path, and then just
push a bash script at a shared place the guest would have a cron to
execute as root, that would trigger a call to the new profile's
switch-to-configuration.

This would also be possible with the store on a qcow2 image, but would
entail also pushing all the store paths through this shared path and
having the guest copy it to its nix-store. I guess it's possible and
doesn't involve many drawbacks, except a time-to-upgrade quite increased
due to the two copies instead of one.

In the downsides of using qcow2, I can see that if using a CoW FS (such
as btrfs) shared between /nix/store and /var/lib/vm/${vmname}/store,
it's possible to have the store of the guest take 0 additional space,
while using a qcow2 image makes it much harder (and I don't think any
widely used FS performs block-level deduplication, but I may be wrong)

So I'd love to hear other voices before switching from one to the other,
as I'm pretty sure we're missing some other decision points.

> Also, the whole idea could be splited to simpler building blocks and
> generalized to use with Virtualbox and different kind of containers.
> One of the block could be, say, "nix-slave" - the NixOS install which
> is always configured on an external machine and then run inside VM or
> container or deployed to the cloud.
> So it cannot do "nixos-rebuild" from inside and has limited set of
> features, no profiles (no need to "boot previous version" if the
> previous version could be written to the .qcow2 of a powered-off VM),
> no "nix-env", etc
> Then, a tool to make container/VM out of configuation.
> Then, a VM-agnistic tool to configure network of that slaves.
> 
> Well, it sounds very familiar.
> We indeed have this pattern in so many places: NixOS containers,
> NixOps, test-driver, "nixos-install build-vm", runInLinuxVM,
> make-disk-image.nix, your proposal, etc
> Each of them solves one narrow task and the code is not reuseful. For
> example, when I need to create .qcow2 outside the nix store, or
> install/repair nixos on exising .qcow2, I end up writing by own set of
> tools (or using RedHat's libguestfs, which is... another VM appliance)
> Perhaps, there could be some common ground which unifies that kind of
> tasks as an alternative to creating new bloated tools with many
> options?

I see you have already seen it, but just for the record, copumpkin has
recently done great work in this domain, with nixos-prepare-root [1]
(it's newly merged, so I didn't use it in my not-yet-PR'd changes, but
it's on my todo-list before opening the PR related to this RFC)

This looks like exactly what you're looking for, except that it still
requires to copy the generated root from a local directory to the right
block device, which can anyway be done only in a way that heavily
depends on which block device it is. It would be possible to do a
make-disk-image as you talked about in comments to [2], but I don't
think it would fit inside the scope of this RFC, rather in a nixpkgs
refactoring (which AFAIU doesn't require a RFC).

Or did I miss your point here?

[1] https://github.com/NixOS/nixpkgs/pull/23026

[2] https://github.com/NixOS/nixpkgs/pull/24964

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.science.uu.nl/pipermail/nix-dev/attachments/20170423/0fe9418c/attachment.sig>