Skip to content

Python script that runs as cron job and notifies HPC users about pending jobs that will never run.

Notifications You must be signed in to change notification settings

HPC/wait_notify

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

wait_notify

WHAT IT DOES

Python script that runs as cron job and notifies HPC users about 'non-runable' jobs, that is pending jobs that will never run.

HOW IT WORKS

The script evaluates the 'reason' code for each pending job.

Currently, it only notifies users whose jobs have a reason of 'PartitionTimeLimit', which means that the user set a time limit for their job which exceeds that allowed for the partition.

Any user with one or more 'non-runable' jobs are sent one email that lists all the non-runable jobs, and a record of the email is stored in an SQLITE3 database. Even those the cron job runs daily, only one email is sent per week.

USAGE

Usage: wait_notify [-Icx] [-n N] [-t ADMIN_EMAIL]

Read Slurm's sinfo output to determine which pending jobs are in a stuck state that will not run, and email the jobs's owner so they can cancel them and re-run if desired.

Each email is logged in an SQLITE3 database, and emails will not be sent if an email has already gone out in the previous week.

OPTIONS -I Run the first time to initialize Sqlite3 database that records emails. -x Send email to users with stuck jobs. OPTIONS USEFUL FOR TESTING -c Check only. List jobs that are in a cancelled state -t ADMIN_EMAIL Useful for testing. Use only with -x. Emails will be sent not to users but to the ADMIN_EMAIL instead. -n N Only email the first N users. -f Force email, even if one was sent in past week

CONFIGURATION

See files user_notify_config.py wait_notify_config.py

About

Python script that runs as cron job and notifies HPC users about pending jobs that will never run.

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages