Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

wait_notify

WHAT IT DOES

Python script that runs as cron job and notifies HPC users about 'non-runable' jobs, that is pending jobs that will never run.

HOW IT WORKS

The script evaluates the 'reason' code for each pending job.

Currently, it only notifies users whose jobs have a reason of 'PartitionTimeLimit', which means that the user set a time limit for their job which exceeds that allowed for the partition.

Any user with one or more 'non-runable' jobs are sent one email that lists all the non-runable jobs, and a record of the email is stored in an SQLITE3 database. Even those the cron job runs daily, only one email is sent per week.

USAGE

Usage: wait_notify [-Icx] [-n N] [-t ADMIN_EMAIL]

Read Slurm's sinfo output to determine which pending jobs are in a stuck state that will not run, and email the jobs's owner so they can cancel them and re-run if desired.

Each email is logged in an SQLITE3 database, and emails will not be sent if an email has already gone out in the previous week.

OPTIONS -I Run the first time to initialize Sqlite3 database that records emails. -x Send email to users with stuck jobs. OPTIONS USEFUL FOR TESTING -c Check only. List jobs that are in a cancelled state -t ADMIN_EMAIL Useful for testing. Use only with -x. Emails will be sent not to users but to the ADMIN_EMAIL instead. -n N Only email the first N users. -f Force email, even if one was sent in past week

CONFIGURATION

See files user_notify_config.py wait_notify_config.py