[Nix-dev] Making Hydra super fast (was: Security channel proposal)

Thu Sep 25 21:11:37 CEST 2014

On Sep 25, 2014 8:19 PM, "Peter Simons" <simons at cryp.to> wrote:
>
> Hi Wout,
>
>  > Another option would be to make Hydra super fast... What has been
>  > explored to optimize compile speeds? Using distcc, ccache, SSD,
>  > elastic scaling?
>
> Hydra is appears slow because "hydra-evaluator" is single-threaded. A
> round-trip evaluating all jobsets on hydra.nixos.org takes almost a day.
If
> a commit comes in 10 minutes after 'master' was evaluated, then it takes
~24
> hours before the first build with that commit even *starts*. Once the
> evaluator has put pending builds into the queue, however, you'll find that
> the build slaves are super fast, actually. The throughput is great, but
the
> latency sucks.

Hmmm, I'm not sure I understand. Are you saying that build slaves are going
idle because of build scheduling?

> The other factor is that only a handful of people have administrator
> privileges on the machines that run those builds, and those people are
> volunteers with jobs, families, and a life. We've seen that 'master' can
be
> stuck for several days because of trivial system issues (disk space,
> anyone?), but none of the people who could do something about it have time
> to actually do it.

I hereby volunteer.

>  > What if we had a security build fund that we could use to briefly run
500
>  > machines to complete security builds? Would that allow 2-hour security
>  > rollouts?
>
> If we're looking only at Linux/x86_64, then 3-5 fast machines could easily
> build all important packages in 'master' in approximately two hours. In
> fact, hydra.nixos.org could do probably do that, too. The problem isn't
the
> build slaves -- the problem is the server that drives them.
>
> To improve the situation, we could:
>
>  (1) Hack hydra-evaluator to be super efficient.
>
>  (2) Rent some machine with lots of disk space, lots of memory, and a good
>      Internet connection to build Nixpkgs 'master', 'release-14.04', but
>      nothing else. Then have a team of 5-10 volunteers administer that
>      machine to guarantee responsiveness.

Sounds good...

> If we'd have some kind of "emergency response fund" that would allow us to
> span build slaves in EC2, it would help matters greatly, too, but only if
> there is someone who can quickly re-configure the main server to take
> advantages of those slaves. Personally, I could easily spawn half a dozen
> build slaves in EC2 and donate them to nixos.org for a day or two, but
then
> I'd have no way to configure hydra.nixos.org to use them!

Sounds like a great problem to solve with NixOps!

Wout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.science.uu.nl/pipermail/nix-dev/attachments/20140925/c126b2b6/attachment.html