qsub is a command used for submission to the SGE cluster. In the section 2.1 Quickstart and basics, we showed that you can submit an example job using qsub as follows:
username@login1:~$ qsub -V -b n -cwd runJob.sh Your job 1 ("runJob.sh") has been submitted
The general syntax of how to use qsub is below.
qsub -q <queue> -w e -N <job_name> -l h_vmem=<memory, e.g. 4G> -l h_rt=<time> -l s_rt=<time> -pe smp <num_slots> -o <outputlogfile> -e <errorlogfile> <pathtoScript> <arg1> <arg2>
Here are the some of the options and their explanations.
-q <queue> set the queue. Often you will use the standard queue, so no need to set this up.
-V will pass all environment variables to the job
-v var[=value]will specifically pass environment variable 'var' to the job
-b yallow command to be a binary file instead of a script.
-w everify options and abort if there is an error
-N <jobname>name of the job. This you will see when you use
qstat, to check status of your jobs.
-l h_vmem=sizespecify the amount of maximum memory required (e.g. 3G or 3500M) (NOTE: This is memory per processor slot. So if you ask for 2 processors total memory will be 2 * hvmem_value)
-l h_rt=<hh:mm:ss>specify the maximum run time (hours, minutes and seconds)
-l s_rt=hh:mm:ssspecify the soft run time limit (hours, minutes and seconds) - Remember to set both s_rt and h_rt.
-pe smp <n_slots>This specifies the parallel environment.
smpruns a parallel job using shared-memory and n_processors amount of cores.
-cwdrun in current working directory
-wd <dir>Set working directory for this job as
-o <output_logfile>name of the output log file
-e <error_logfile>name of the error log file
-m eaWill send email when job ends or aborts
-P <projectName>set the job's project
-M <emailaddress>Email address to send email to
-t <start>-<end>:<incr>submit a job array with start index
You can see full list of arguments and explanations here.
Now let's see some example bash scripts and qsub commands that submit those jobs to the cluster.
Bowtie is a short-read aligner used to align reads from next-gen sequencing machines. The script includes the commandline options within it, so no need to pass those options on the commandline. In a shell script, you can set the qsub options in lines that begin with #$. This an alternative to passing them with the qsub command.
#!/bin/bash #$ -N run_bowtie2 #$ -cwd #$ -pe smp 6 #$ -l h_vmem=6G infile=/data/bioinfo/READS2/R1_001.fastq.gz outfile=/data/bioinfo/READS2/aln/R1_001.sam btindex=/data/bioinfo/genome_data/Caenorhabditis_elegans/UCSC/ce10/Sequence/BowtieIndex/genome gzip -dc $infile | bowtie --chunkmbs 300 --best -m 1 -p 6 --phred33 -q $btindex - -S $outfile
#$are the qsub options described above. They customize the job and ask for resources such as CPU and memory.
The input/output files for the bowtie command are stored in
btindex variables. Then, passed to the bowtie command. This is not really necessary but it is easier to read later on what is going on.
The execution of bowtie starts with the
gzip -dc .... Since fastq files are compressed we first need to decompress them and feed the output to bowtie via
--chunkmbs 300 --best -m 1 -p 6 --phred33 -qare bowtie specific options. Look them up in the Bowtie manual.
And, you can run this as follows (assuming this file is saved as 'runBowtie.sh').
$ qsub runBowtie.sh