[Nix-dev] Locking breaks distributed builds with a common /nix/store

David Soergel dev at davidsoergel.com
Thu Apr 1 11:49:12 CEST 2010


Hi all,

I'm trying to get distributed builds working on a cluster, where my /nix/store is on a fast scratch filesystem that is mounted on all of the compute nodes.

The standard build-remote.pl copies inputs and outputs back and forth on the assumption that each machine has its own /nix/store.  That would produce a lot of superfluous i/o in my case (and the compute nodes don't have any appropriate local disk anyway), so I just commented out those lines of the script.

Things seem to be almost working, except that locks get in the way.  For each derivation, the output directory gets locked on the master node before the remote build hook is called; that makes sense.  Thus, when nix-store is invoked on the remote node, it sits forever waiting for the lock.

Is there any reasonable way around this?  My understanding is that the locks are held as open filehandles, so there's no straightforward way to hand off the active lock to the remote node.

Perhaps I could hack a command-line option to nix-store to tell it to ignore locks entirely, and use that option in the build hook.  Since the master process is holding a lock for the duration of the remote build, that ought to be safe, right?

Thanks for any insights,

-ds


_______________________________________________________
David Soergel                            (650) 303-5324
dev at davidsoergel.com        http://www.davidsoergel.com
_______________________________________________________




More information about the nix-dev mailing list