[Nix-dev] Help needed: patching Hydra to retry failed builds after a while

Gergely Risko gergely at risko.hu
Wed Jan 8 16:53:18 CET 2014


Hi,

(Hungarian version below.)

Yes, you're right, at least in part.

I actually checked
http://hydra.nixos.org/job/nixpkgs/trunk/haskellPackages_ghc763_profiling.pipesParse.i686-linux/all,
here you can see that a lot of the failures are cached failures, but
you're right, the hash of the ghc is changing sometimes.  Actually your
GHC hydra shows this perfectly, because all of those are different
tries, not cached failures of the same tries.

If you check the dates carefully:
Dependency failed 	7819290 	ghc-7.6.3-wrapper 	i686-linux 	2014-01-07 22:44:31
Dependency failed 	7634985 	ghc-7.6.3-wrapper 	i686-linux 	2014-01-04 14:13:33
Dependency failed 	7395570 	ghc-7.6.3-wrapper 	i686-linux 	2013-12-26 19:41:49
Dependency failed 	6662085 	ghc-7.6.3-wrapper 	i686-linux 	2013-10-27 15:36:05

2013-10-27 -> 2013-12-26 is 2 months without retrying.  This seems not
acceptable to me in the case of an out of memory error.  In this case
you're right that the retry didn't help and hydra admins have to fix the
machine somehow.  (I will write a separate mail about that.)  But in
general, currently retry doesn't happen and we have to wait for a change
to nixpkgs git that triggers a new build.  That's what I want to change.

About local GHC i686 build: I haven't tried, I will try to make some
time and try it, but I'm pretty sure that it's a memory limitation on
the machine, so let's ping the admins! :)

Thanks,
Gergely

-=-

Helló,

Igen, igazad van, legalábbis részben.

Én a
http://hydra.nixos.org/job/nixpkgs/trunk/haskellPackages_ghc763_profiling.pipesParse.i686-linux/all
URL-t néztem, itt látható, hogy a hibák jó része cachelt hiba;
ugyanakkor igazad van, a hashe ghcnek változik néha.  A te GHC hydra
linked nagyon jó ilyen szempontból, mert az csak az igazi próbákat
mutatja, a cachelt hibákat nem.

Ha megnézed a dátumokat pontosan:
Dependency failed 	7819290 	ghc-7.6.3-wrapper 	i686-linux 	2014-01-07 22:44:31
Dependency failed 	7634985 	ghc-7.6.3-wrapper 	i686-linux 	2014-01-04 14:13:33
Dependency failed 	7395570 	ghc-7.6.3-wrapper 	i686-linux 	2013-12-26 19:41:49
Dependency failed 	6662085 	ghc-7.6.3-wrapper 	i686-linux 	2013-10-27 15:36:05

2013-10-27-től 2013-12-26-ig az 2 hónap újrapróbálkozás nélkül.  Ez az
ami szerintem nem elfogadható egy átmeneti memória elfogyás esetén.
Ebben az esetben igazad van abban, hogy a retry sem segít és a hydra
adminoknak kell a gépet megjavítania.  (Írok mindjárt egy külön
levelet.)  De úgy általában, jelenleg újrapróbálkozás csak akkor
történik, ha kivárjuk, hogy valaki megváltoztassa a nixpkgs gitet.  Ezt
szeretném megváltoztatni.

A helyi GHC i686 fordítással kapcsolatban: nem próbáltam, megpróbálom
majd, ha lesz egy kis időm.  De elég biztos vagyok benne, hogy a
hibaüzenet valódi és egyszerűen tényleg elfogyott a gépen a memória,
szólok az adminoknak!

Üdv,
Gergő

On Wed, 8 Jan 2014 10:34:50 -0500, Thomas Bereknyei <tomberek at gmail.com> writes:

> Jo napot,
>
> I'm willing to help you with working on Hydra, it's something I want
> to get involved with. I'm not sure who is in charge of Hydra
> administration.
>
>>From just a cursory look, it seems that the error is not transient, or
> at least it consistently breaks [1] with:
>
>> ghc-stage1: out of memory (requested 1048576 bytes)
>
> [1] http://hydra.nixos.org/job/nixpkgs/trunk/haskellPackages.ghc.i686-linux
>
> Have you tried building it locally? I do not have a i686 machine at the moment.
>
> Es szertnem gyakorolni a magyar irast.
>
> -Tom
>
> On Wed, Jan 8, 2014 at 9:50 AM, Gergely Risko <gergely at risko.hu> wrote:
>> Hi,
>>
>> Happy new year to all the Nixers around here!
>>
>> In https://github.com/NixOS/hydra/issues/139 I reported the following issue:
>>
>>> It happens quite frequently that some build breaks with a transient
>>> failure on some Hydra machine. The most recent example is GHC on
>>> i686. The only solution in these situations is to whine on the mailing
>>> list and hope that some hydra admin will restart the failed build.
>>>
>>> It'd be much better to have a TTL for negative build caching and retry
>>> failed builds e.g. every week at least once even if the derivation
>>> didn't change. That would ensure that transient errors get fixed even
>>> without manual intervention.
>>
>> Since I received no comments on the ticket, may I ask for opinions here?
>> Is this a good idea to do?  If yes, can someone with actual coding and
>> design experience with hydra help me please?
>>
>> Are there any design decisions to make?  Can someone point me to the
>> relevant parts of the codebase and give a little bit of an overview what
>> I have to do to achieve this goal?  I'd be happy to figure out the
>> details and prepare a patch of course.
>>
>> Currently I can't update my haskell machines for the last 4 months on
>> i686 because of this and always pinging the hell out of hydra admins
>> seems to be a waste of everybody's time if this can be automated.
>>
>> Thanks,
>> Gergely
>>
>> _______________________________________________
>> nix-dev mailing list
>> nix-dev at lists.science.uu.nl
>> http://lists.science.uu.nl/mailman/listinfo/nix-dev



More information about the nix-dev mailing list