SPIDER's Pubsub System for Distributed Processing

This page illustrates techniques for running SPIDER using the Pubsub system on a cluster.


With Pubsub, SPIDER procedures can be run in parallel on a distributed cluster of computers or within a single cluster. The user places his SPIDER jobs in a shared que. Each of the subscriber machines can receive jobs from the que. Each subscriber machine can specify when it will take jobs and how many jobs it can take at a time. If the machines vary greatly in processing power, it is best to partition the SPIDER jobs so that they will take a reasonable length of time (e.g. 20...100 minutes) so that the subscription process is most efficient. I have provided several master SPIDER procedures that handle the preparation of the parallel SPIDER procedures.

Running SPIDER jobs using Pubsub

  1. Must have installed Pubsub and started the subscriber process on the master PubSub machine as described elsewhere.
  2. You can run the SPIDER job on the master PubSub machine or any other machine which is capable of using 'ssh' without a password to connect to the master. The master PubSub machine and all of the compute nodes must be able to access YOUR_SPIDER_WORKING_DIR using the same pathway.

    e.g. cd $HOME/spider/data
  3. Submit your SPIDER job to the publisher using publish e.g.
    publish "./spider pam/dat @pub_refine 17 iter=17"
  4. Instructions are available for use of PubSub in:
  5. To write SPIDER batches which run under PubSub see the above examples and info in: parallel.html

Source: spider/pubsub/pubsub.html     Page updated: 10 Aug. 2010     ArDean Leith