[Nix-dev] Making Hydra super fast (was: Security channel proposal)

Peter Simons simons at cryp.to
Thu Sep 25 20:19:00 CEST 2014


Hi Wout,

 > Another option would be to make Hydra super fast... What has been
 > explored to optimize compile speeds? Using distcc, ccache, SSD,
 > elastic scaling?

Hydra is appears slow because "hydra-evaluator" is single-threaded. A
round-trip evaluating all jobsets on hydra.nixos.org takes almost a day. If
a commit comes in 10 minutes after 'master' was evaluated, then it takes ~24
hours before the first build with that commit even *starts*. Once the
evaluator has put pending builds into the queue, however, you'll find that
the build slaves are super fast, actually. The throughput is great, but the
latency sucks.

The other factor is that only a handful of people have administrator
privileges on the machines that run those builds, and those people are
volunteers with jobs, families, and a life. We've seen that 'master' can be
stuck for several days because of trivial system issues (disk space,
anyone?), but none of the people who could do something about it have time
to actually do it.


 > What if we had a security build fund that we could use to briefly run 500
 > machines to complete security builds? Would that allow 2-hour security
 > rollouts?

If we're looking only at Linux/x86_64, then 3-5 fast machines could easily
build all important packages in 'master' in approximately two hours. In
fact, hydra.nixos.org could do probably do that, too. The problem isn't the
build slaves -- the problem is the server that drives them.

To improve the situation, we could:

 (1) Hack hydra-evaluator to be super efficient.

 (2) Rent some machine with lots of disk space, lots of memory, and a good
     Internet connection to build Nixpkgs 'master', 'release-14.04', but
     nothing else. Then have a team of 5-10 volunteers administer that
     machine to guarantee responsiveness.

If we'd have some kind of "emergency response fund" that would allow us to
span build slaves in EC2, it would help matters greatly, too, but only if
there is someone who can quickly re-configure the main server to take
advantages of those slaves. Personally, I could easily spawn half a dozen
build slaves in EC2 and donate them to nixos.org for a day or two, but then
I'd have no way to configure hydra.nixos.org to use them!

Best regards,
Peter



More information about the nix-dev mailing list