Skip to content

zhs04001/cleanup-roster

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
December 24, 2023 07:08

A Python script that cleans up rosters downloaded from HuskyCT

This script reads downloaded roster files and writes selected columns/fields into a CSV file. If the output filename is not specified, the lines are printed to the standard output, which can be redirected into a file.

Usage

Assume all files are under the current directory.

The following command prints info on the screen.

python cleanup-roster.py downloaded.xls 

The -o option specifies an output file.

python cleanup-roster.py downloaded.xls -o section1.csv

Multiple xls files can be specified.

The script does not output all fields. The -f option specifies the additonal fields, one or more, to be included. For example, the following command adds enrollment date and program fields to the output.

python cleanup-roster.py downloaded.xls -f enrollment program

--fields option specifies a full list of fields to be printed. all means all fields.

python cleanup-roster.py downloaded.xls --fields name netid
python cleanup-roster.py downloaded.xls --fields all

In Powershell, the following commands process multiple xls files.

# write all students in multiple files to single output file all.csv
py cleanup-roster.py $(dir *.xls) -o all.csv

# save student records in separate csv files
dir *.xls | foreach-object { py cleanup-roster.py $_ -o ($_.name -replace '.xls','.csv') }

Notes

Dec 2023

The downloaded files are actually HTML files and information is in an HTML table. Excel can open it, after a warning. The format of the file changed several times. Currently, the "Program and Plan" field is in a single table cell. For example,

<td  >Engineering -
^MComputer Science</td>

Change the script to read downloaded file directly, instead of reading XLSX files.

Nov 2022

The format of downloaded files keeps changing, mainly in "Program and Plan" field. The format in Nov 2022 is:

<school> - <\r><major>
<school> - <\r><major>/<minor>
<school> - <\r><major>/<minor1/>/<minor2>
<school> - <\r><major>/<major2> as Second
<school> - <\r><major>/<major2> Second

The school can be :

Engineering
Liberal Arts & Sciences
CCS Non-Degree

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages