[Nix-dev] Locking breaks distributed builds with a common /nix/store

David Soergel dev at davidsoergel.com
Thu Apr 1 12:31:50 CEST 2010


Oops, sorry for the double post.

Anyhow I solved my problem by releasing the lock on the master node after the build hook accepts the job.  Then the compute node acquires a new lock and performs the build.   

I just added "outputLocks.unlock();" at build.cc line 1036.  It's possible that another goal will try to lock the path again between the time that the master releases the lock and the compute node acquires it.  In that case, I think, the original builder will remain blocked and we'll generate a new (redundant) build hook request.  Eventually one of the build processes should actually get the lock and build the derivation.  When it finishes, the extra builders should find the path valid and just use it at build.cc line 997.

I realize that this is a hack that only applies to the situation of a common /nix/store.  In the standard model where each node has its own /nix/store, this patch could easily cause the same derivation to be built many times on different nodes.  Perhaps something like this could be a configuration option, though?

Also, there may be some performance issues (e.g., lots of redundant processes waiting around for the same lock, then all acquiring and releasing it in turn).  But, in practice, my jobs seem to be running well now.

-ds


_______________________________________________________
David Soergel                            (650) 303-5324
dev at davidsoergel.com        http://www.davidsoergel.com
_______________________________________________________




More information about the nix-dev mailing list