[Nix-dev] hydra admins: please take care of i686 GHC out of memory issues

Gergely Risko gergely at risko.hu
Wed Jan 22 23:44:15 CET 2014


Hi,

I'm just guessing here, but:
  - there are packages, that depend on GHC, e.g. git-annex:
      http://hydra.nixos.org/build/8421542
  - this is the same evaluation, but the build number is smaller.

So what I think is happening:
  - there is package X that build depends on GHC,
  - when package X is built, then GHC is built (and fails),
  - every next GHC try is cached failure (git-annex, ghc-wrapper, etc.).

I don't know how to derive X via the hydra webinterface, but maybe it's
easy for admins with SQL access, I really have no idea.  And my guess is
that once we find package X, that will contain the real (non-cached)
build for GHC in this evaluation.  If you check the hash of the GHC that
is trying to be built, then you will see that it is actually changing
between the success-failure transition, so this would really be a new
one compared to the ones from 21st January.

And about the failures: I've been thinking about this for the last 2-3
days and probably we're just being fooled by randomness.  Before the
stdenv merge, we thought that trunk is failing and stdenv is not, but
actually if you review history, we have 3 different (with different
hash, so not just cached) GHC build tries on trunk, all failed.  But
there are also failed ones on the stdenv trunk, circa 1/2 of the cases.
So having 3 failes in a row is only p=1/8.

Now that stdenv is merged, 2 builds are good and then the third one is
broken.  This can easily be all just random from a random source with
1/2 chance.  I wanted to actually get some statistically significant
evidence this by setting up my local build machine where I can build
10-20 times, but didn't get to it yet.  Maybe with different settings:
lot of parallelism (-j16), then with none (-j1), etc.

In the meantime can we please disable any parallelism during build for
i686 (so no -j passing to any stage of the compiler) and try like that?
Maybe we're just somehow hitting the 4G one process address space limit
with threads or something.

Also, would it too much of a waste of hydra resources to try 10
consecutive builds for _absolutely the same derivation_ and gather
failure/success ratios?

Thanks,
Gergely

On Wed, 22 Jan 2014 13:59:17 +0100, Peter Simons <simons at cryp.to> writes:

> Hi guys,
>
> I am confused. According to [1] we've had a working GHC build for Linux/i686
> in evaluations 8394199 and 8396207 -- immediately after the stdenv-updates
> merge. Then Hydra ran a third build, [2], and that ended up being a "cached
> failure". Yet, there were no failed builds since the stdenv merger! How is
> that possible?
>
> Am I missing something?
>
> Take care,
> Peter
>
> [1] http://hydra.nixos.org/job/nixpkgs/trunk/haskellPackages.ghc.i686-linux
> [2] http://hydra.nixos.org/build/8427824



More information about the nix-dev mailing list