SLURM Job Schemas
SlurmModel
Bases: BaseModel
General SLURM job model including job options and the job script path, used for
submitting a job through the REST API: POST job/{jobid}
Attributes:
Name | Type | Description |
---|---|---|
script |
str
|
full absolute path to script to be submitted as a job to the SLURM cluster |
job |
SlurmSubmit
|
SLURM sbatch options for the associated job |
Source code in catena/models/slurm_submit.py
224 225 226 227 228 229 230 231 232 233 234 235 |
|
SlurmSubmit
Bases: ExtendedBaseModel
SLURM sbatch options: see SLURM documentation for more details.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
SLURM job name |
delay_boot |
Optional[int]
|
do not reboot nodes in order to satisfied this job's feature specification if the job has been eligible to run for less than this time period, defaults to 0 (suggested to leave as default) |
dependency |
Optional[str]
|
defer the start of this job until the specified dependencies have been satisfied completed.
All dependencies must be satisfied if the ','separator is used. Dependencies are given in the format:
--dependency= |
distribution |
Optional[str]
|
specify alternate distribution methods for remote processes. In sbatch, this only sets environment variables that will be used by subsequent srun requests, defaults to 'arbitrary' |
environment |
Optional[dict]
|
map of systems path to be set within the users environment when running the SLURM job, defaults to None |
exclusive |
Optional[str]
|
The job allocation can not share nodes with other running jobs (or just other users with the '=user' option or with the '=mcs' option), defaults to 'user' |
get_user_environment |
Optional[str]
|
this option will tell sbatch to retrieve the login environment variables for the user specified in the
|
gres |
Optional[str]
|
specifies a comma delimited list of generic consumable resources, defaults to None |
gres_flags |
Optional[str]
|
specify generic and resource task binding options ( |
gpu_binding |
Optional[str]
|
bind tasks to specific GPUs. By default every spawned task can access every GPU allocated to the step, defaults to 'closest' |
gpu_frequency |
Optional[str]
|
request that GPUs allocated to the job are configured with specific frequency values. This option can be used to independently configure the GPU and its memory frequencies, defaults to 'medium' |
gpus |
Optional[str]
|
specify the total number of gpus required for the job ' |
gpus_per_node |
Optional[str]
|
specify the number of GPUs required for the job on each node included in the job's resource allocation, defaults to None |
gpus_per_socket |
Optional[str]
|
specify the number of GPUs required for the job on each socket included in the job's resource allocation. An optional GPU type specification can be supplied, defaults to None |
gpus_per_task |
Optional[str]
|
specify the number of GPUs required for the job on each task to be spawned in the job's resource allocation. An optional GPU type specification can be supplied |
hold |
Optional[str]
|
specify the job is to be submitted in a held state (priority of zero). A held job can now be released using scontrol to reset
its priority (e.g. 'scontrol release |
licenses |
Optional[str]
|
specification of licenses (or other resources available on all nodes of the cluster) which must be allocated to this job. License names can be followed by a colon and count (the default count is one). Multiple license names should be comma separated (e.g. '--licenses=foo:4,bar') |
mail_type |
Optional[str]
|
notify user by email when certain event types occur, defaults to 'NONE' (see SLURM documentation for full list of options) |
mail_user |
Optional[str]
|
user to receive e-mail notification of state changes defined by |
memory_binding |
Optional[str]
|
bind tasks to memory. Used only when the task/affinity plugin is enabled and the NUMA memory functions are available, defaults to None. |
memory_per_cpu |
Optional[str]
|
minimum memory required per allocated CPU (default units are MB, different units can be specified using the suffix [K|M|G|T]), defaults to 0 |
memory_per_gpu |
Optional[str]
|
minimum memory required per allocated GPU (default units are megabytes, different units can be specified using the suffix [K|M|G|T]), defaults to 0 |
memory_per_node |
Optional[str]
|
specify the real memory required per node (default units are megabytes, different units can be specified using the suffix [K|M|G|T]), defaults to 0 |
cpus_per_task |
Optional[int]
|
advise the SLURM controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task, defaults to 0 |
minimum_cpus_per_node |
Optional[str]
|
specify a minimum number of logical cpus/processors per node, defaults to 0 |
minimum_nodes |
Optional[str]
|
if a range of node counts is given, prefer the smaller count, defaults to 'true' |
nice |
Optional[str]
|
run the job with an adjusted scheduling priority within Slurm. With no adjustment value the scheduling priority is decreased by 100. A negative nice value increases the priority, otherwise decreases it, defaults to None |
no_kill |
Optional[str]
|
do not automatically terminate a job if one of the nodes it has been allocated fails.
The user will assume the responsibilities for fault-tolerance should a node fail. When there is a node failure,
any active job steps (usually MPI jobs) on that node will almost certainly suffer a fatal error, but with
|
nodes |
Optional[int]
|
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count. The partition's node limits supersede those of the job. If a job's node limits are outside of the range permitted for its associated partition, the job will be left in a PENDING state, defaults to 1 |
open_mode |
Optional[str]
|
(append|truncate) open the output and error files using append or truncate mode as specified. The default value is specified by the system configuration parameter JobFileAppend, defaults to 'append' |
partition |
Optional[str]
|
request a specific partition for the resource allocation, defaults to 'normal' |
qos |
Optional[str]
|
request a quality of service for the job, defaults to 'user' |
requeue |
Optional[str]
|
specifies that the batch job should be eligible for requeuing, defaults to 'true' |
reservation |
Optional[str]
|
allocate resources for the job from the named reservation, defaults to None |
sockets_per_node |
Optional[int]
|
restrict node selection to nodes with at least the specified number of socket, defaults to 0 |
spread_job |
Optional[str]
|
Spread the job allocation over as many nodes as possible and attempt to evenly distribute tasks across the allocated nodes (this option disables the topology/tree plugin), defaults to 'true' |
standard_error |
Optional[str]
|
instruct SLURM to connect the batch script's standard error directly to the file name at the specified path, defaults to None |
standard_in |
Optional[str]
|
instruct Slurm to connect the batch script's standard input directly to the file name at the specified path, defaults to None |
standard_out |
Optional[str]
|
instruct Slurm to connect the batch script's standard output directly to the file name at the specified path, defaults to None |
tasks |
Optional[int]
|
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the
SLURM controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources,
**defaults to 1 task per node, but note that the |
tasks_per_core |
Optional[int]
|
request the maximum ntasks be invoked on each core (meant to be used with the |
tasks_per_node |
Optional[int]
|
r that ntasks be invoked on each node (if used with the --ntasks option, the |
tasks_per_socket |
Optional[int]
|
request the maximum ntasks be invoked on each socket (meant to be used with the |
threads_per_core |
Optional[int]
|
restrict node selection to nodes with at least the specified number of threads per core. In task layout, use the specified maximum number of threads per core, defaults to 0 |
time_limit |
Optional[Union[int, str]]
|
set a limit on the total run time of the job allocation, defaults to None |
wait_all_nodes |
Optional[str]
|
(0|1) controls when the execution of the command begins, defaults to 0 (the job will begin execution as soon as |
wckey |
Optional[str]
|
specify wckey to be used with job, defaults to None |
cores_per_socket |
Optional[int]
|
restrict node selection to nodes with at least the specified number of cores per socket, defaults to None |
core_specifications |
Optional[int]
|
count of specialized cores per node reserved by the job for system operations and not used by the application, defaults to None |
Source code in catena/models/slurm_submit.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
|