[Nix-dev] "Monitoring" NixOS?

Tue Feb 14 09:58:09 CET 2017

It would be useful to know how far behind a machine is given a specific
channel. Especially if you care about security updates.

A "needs reboot" flag for when the kernel is updated.

GC state or just store size.

And the the common Linux monitoring metrics apply for vmm, CPU, net, io,
entropy pool, ...

On Mon, 13 Feb 2017, 14:29 Daniel Peebles, <pumpkingod at gmail.com> wrote:

> Hi all,
>
> I just packaged up the AWS SSM agent [1], which is a cool system for
> automated management of fleets of machines both in AWS and outside of it,
> allowing you to run commands on all of them, check "inventory" across all
> of them automatically, set policies on disparate types of machines, and so
> on.
>
> NixOS seems to work fine with it and I can run commands on it and keep an
> eye on the current NixOS release by injecting a fake lsb_release into its
> path. But one of the features of SSM is the ability to take an inventory of
> "installed" packages on a system. Of course, that notion doesn't directly
> make sense in NixOS, but it got me wondering what sorts of metrics might
> make sense from a "keep an eye on your fleet of NixOS systems" perspective.
>
> Some possibilities:
>
>    1. Track runtime dependencies of the system root, and ideally maintain
>    an external mapping of all of those hashes to expressions that produce
>    them. The first part I know how to do, but the second part seems tricky.
>    2. Monitor "GC state" of your NixOS system: count how many
>    unreferenced derivations are in the store and how much disk space past
>    system generations retain (factoring in hard linking and such)
>    3. Dump current systemd unit state (broader than just NixOS, obviously)
>    4. Track total time spent building derivations and downloading
>    substitutes: could be helpful to understand that some of your machines
>    aren't accessing your binary cache properly. Perhaps also a "binary cache
>    hit rate" metric.
>
> Does anyone have others? If you manage a large fleet of NixOS machines
> (and possibly other types of OSes too, so NixOps might not be suitable),
> which metrics do you find useful? Even if you do use NixOps to manage the
> state of your machines, ongoing metrics can still be useful for assessing
> the health of your systems. You don't want to be surprised by a machine's
> drive filling up because its store is full of junk :)
>
> Thanks,
> Dan
>
> [1] https://aws.amazon.com/ec2/systems-manager/
> _______________________________________________
> nix-dev mailing list
> nix-dev at lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.science.uu.nl/pipermail/nix-dev/attachments/20170214/035fbadd/attachment-0001.html>