Job Manifests
Job manifests are YAML files that allows you to organize instructions for submitting any number of jobs or scripts of various languages to SLURM for scheduling within the cluster.
Structure of a Job Manifest
At the top of the manifest we specify the schema version and a cluster profile. The cluster profile provides the name of the SLURM cluster to which the jobs should be submitted. The configuration is specfied in the users HOME directory in $HOME/.catena/conf.yml
. The schema for this can be seen here
After that, they are comprised of two sections:
-
job_options (Optional): Section under which job option blocks can be added. Each of these named configuration sets defines SLURM sbatch options and environment modules that should be loaded for a given job. A single option block can be applied to a single job, or multiple jobs that run different scripts, but have similar requirements. These named configuration sets are referenced using the built-in
&
and*
constructors in YAML. -
jobs (Required): Section under which jobs to run are defined.
The most basic job manifest is for a single job, with the optional job_options
section excluded:
---
jobs:
- matlab_test:
job_script: "/path/to/matlab/script/to/run.m"
job:
env_modules:
- matlab/96
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
This manifest will submit a job with the name matlab_test
to the SLURM scheduler, requesting a single node with 2 cores and 2GB of RAM. The job will run a MATLAB script, which exists at the full absolute path job_script
. In this instance a version of MATLAB is required to run the job_script
provided and, therefore, under env_modules
we add matlab/96
to the list of environment modules to be loaded.
Each job follows the general structure:
---
jobs:
- job_name1:
job_sript: "path to script to be run"
job:
...job options...
- job_name2:
job_sript: "path to script to be run"
job:
...job options...
Any number of jobs can be defined under the jobs
section.
---
version: 1.0
cluster_profile: 'my_slurm_cluster1'
jobs:
- job_name1:
job_sript: "path to script to be run"
job:
...job1 options...
- job_name2:
job_sript: "path to script to be run"
job:
...job2 options...
...
...
🛎️ Important: The configuration properties that may be defined within a job option block are controlled by the [SlurmSubmit
(models/slurm_job_schemas.md#slurm_submit) schema.
In addition to SLURM sbatch options, a given job definition is comprised as many so called Extra Options. These are not SLURM options but are also used in initializing a Job
object to be submitted
to a SLURM cluster.
Here are some key points about defining job options for a job definition:
-
Job options can be defined globally under
job_options
and referenced within a given job definition using anchors/aliases under the job key -
They can be defined locally within a given job definition the job key within a given job definition should be reserved for global job options defined under job_options
-
All job options that can be defined globally can also be defined locally
-
Job options defined locally will take precedence over global job options defined under the job key
-
Useful for having a shared set of job settings across multiple jobs while still being able to define local job settings that differ from job to job.
Defining Reusable (Global) Job Options: job_options
The basic manifest provided for a MATLAB job could also be re-written using the job_options
section as follows:
---
job_options:
matlab: &matlab
env_modules:
- matlab/96
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
jobs:
matlab_test:
job_script: "/path/to/matlab/script/to/run.m"
job: *matlab
&matlab
. Using anchors provides a means of freely referencing this block elsewhere within your manifest by calling it with the corresponding alias, *matlab
, as shown.
With this in mind, consider a case where we'd like to run a series of jobs in either MATLAB or Python. In this instance the job_options
section provides a means of better organizing our manifest:
---
job_options:
matlab: &matlab
env_modules:
- matlab/96
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
python: &python
env_modules:
- anaconda3/2021.05
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
jobs:
- matlab_job1:
job_script: "/path/to/matlab/script/to/run.m"
job: *matlab
- python_job1:
job_script: "/path/to/python/script/to/run1.py"
job: *python
- python_job2:
job_script: "/path/to/python/script/to/run2.py"
job: *python
Which is equivalent to
---
version: 1.0
cluster_profile: 'my_slurm_cluster'
jobs:
- matlab_job1:
job_script: "/path/to/matlab/script/to/run.m"
job:
env_modules:
- matlab/96
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
- python_job1:
job_script: "/path/to/python/script/to/run1.py"
job:
env_modules:
- anaconda3/2021.05
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
- python_job2:
job_script: "/path/to/python/script/to/run2.py"
job:
env_modules:
- anaconda3/2021.05
standard_out: '/path/to/stdout/file.out'
standard_error: '/path/to/stderr/file.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
Cleary, when the number of jobs and options begins to grow this could become more cumbersome to read.
- The following is an example of a simple job manifest
Using the !include
Constructor
The !include
constructor allows you to include options or blocks from other external YML files, given the path. For instance, given three YAML files in the same directory:
- manifest.yml: the main manifest
- job_opts1.yml: manifest containing global job options
- jobs.yml: manifest containing job definitions
The !include
constructor can be used for defining either job_options
or jobs
or both. For example given two YML files in the same directory, manifest.yaml and jobs.yaml:
manifest.yaml
---
job_options:
- matlab: &matlab
env_modules:
- matlab/96
standard_out: '~/man_matlab.out'
standard_error: '~/man_matlab.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
python: &python
env_modules:
- anaconda3/2021.05
standard_out: '~/man_python.out'
standard_error: '~/man_python.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
jobs:
- !include jobs.yaml
jobs.yaml
---
jobs:
- matlab_job1:
job_script: "/path/to/matlab/script/to/run.m"
job: *matlab
- python_job1:
job_script: "/path/to/python/script/to/run1.py"
job: *python
- python_job2:
job_script: "/path/to/python/script/to/run2.py"
job: *python
- The following example looks at this exact type of application.
🛎️ Important: Any number of include statements can be listed for referencing job_options
and/or jobs
from external YAML files. For instance, the Python and MATLAB job definitions were split into two files matlab_jobs.yaml
and python_jobs.yaml
. Then, they could be included like this:
jobs:
- !include matlab_jobs.yaml
- !include python_jobs.yaml
🛎️ Important: The !include
constructor and global job_options
of a manifest can be used separately or combined to better organize more complex workloads.
🛎️ Important: When global job_options
are included from external YML files and the global options are tagged with anchors(&), these options can be referenced by there corresponding alias (*). For instance, given the following 2 external YML files and a main manifest:
One external YML files for defining global job options for matlab jobs:
matlab_opts.yml
---
job_options:
- matlab: &matlab
env_modules:
- matlab/96
standard_out: '~/man_matlab.out'
standard_error: '~/man_matlab.err'
cpus_per_task: 2
tasks: 1
memory_per_node: '2GB'
Another listing job definitions that alias the &matlab
anchor
matlab_jobs.yml
---
jobs:
- matlab_job1:
job_script: "/path/to/matlab/script/to/run.m"
job: *matlab
Then, using the !include
constructor, the main manifest would be:
manifest.yml
---
job_options:
- !include matlab_opts.yml
jobs:
- !include matlab_jobs.yml
Or without using the !include
constructor, the following main manifest would be valid as well:
manifest.yml
---
job_options:
- !include matlab_opts.yml
jobs:
- matlab_job1:
job_script: "/path/to/matlab/script/to/run.m"
job: *matlab
Job Manifests Best Practices
Using the !include
constructor complex manifests can be organized into multiple YML files. For advanced job manifests involving multiple YAML files, it is good practice to keep things organized and make your manifests easy to navigate. Here is a sample manifest directory.
📝Note: Any number of directories can be created and referenced within the main manifest root directory (i.e the directory container the main manifest to be run).
- The following example demonstrates how this setup can be used in practice. It also serves to demonstrate how relatives paths can be used to define
job_options
that correspond to paths, which is the subject of the following section.
Understanding the Context in Which the Manifest is Run
So far in the toy examples shown, not much attention has been given to the path definitions within job manifests and how they are resolved. In particular, it is generally useful to use relative paths to clean up a manifest and reduce the verbosity.
🛎️ Important: When definining paths within any manifest, all paths should be taken relative to the main manifest root directory:
In addition to be able to define relative paths, the ~
expression is also valid and is expanded to the full absolute path of the home directory for the user calling SlurmJob
or running a manifest.
Ultimately, all paths defined within a manifest or SlurmJob
are expanded to there absolute form.
📝Note: expansion of paths defined within a given job definition (including any job options) is performed when validating the inputs to SlurmJob
(i.e, using Pydantic validators in both the
SlurmSubmit
and the JobOptions
schemas)
Submitting a Manifest
Currently, the Manifest
object can be invoked by calling the catena
module directly and providing the path to a manifest to be run.
$ python3 -m catena /path/to/my/manifest.yaml
💻 Submitted job: 300205
💻 Submitted job: 300206
💻 Submitted job: 300207
💻 Submitted job: 300208
📝 Note: Currently not packaged.