Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# wait_notify
# WHAT IT DOES
Python script that runs as cron job and notifies HPC users about 'non-runable'
jobs, that is pending jobs that will never run.
# HOW IT WORKS
The script evaluates the 'reason' code for each pending job.
Currently, it only notifies users whose jobs have a reason of
'PartitionTimeLimit', which means that the user set a time limit for their job
which exceeds that allowed for the partition.
Any user with one or more 'non-runable' jobs are sent one email that lists all
the non-runable jobs, and a record of the email is stored in an SQLITE3
database. Even those the cron job runs daily, only one email is sent per week.
# USAGE
Usage: wait_notify [-Icx] [-n N] [-t ADMIN_EMAIL]
Read Slurm's sinfo output to determine which pending jobs are in a stuck
state that will not run, and email the jobs's owner so they can
cancel them and re-run if desired.
Each email is logged in an SQLITE3 database, and emails will not be sent
if an email has already gone out in the previous week.
OPTIONS
-I Run the first time to initialize Sqlite3 database
that records emails.
-x Send email to users with stuck jobs.
OPTIONS USEFUL FOR TESTING
-c Check only. List jobs that are in a cancelled state
-t ADMIN_EMAIL Useful for testing. Use only with -x. Emails will be
sent not to users but to the ADMIN_EMAIL instead.
-n N Only email the first N users.
-f Force email, even if one was sent in past week
# CONFIGURATION
See files
user_notify_config.py
wait_notify_config.py