A Python script that cleans up rosters downloaded from HuskyCT
This script reads downloaded roster files and writes selected columns/fields into a CSV file. If the output filename is not specified, the lines are printed to the standard output, which can be redirected into a file.
Usage
Assume all files are under the current directory.
The following command prints info on the screen.
python cleanup-roster.py downloaded.xls
The -o
option specifies an output file.
python cleanup-roster.py downloaded.xls -o section1.csv
Multiple xls
files can be specified.
The script does not output all fields. The -f
option specifies the additonal
fields, one or more, to be included. For example, the following command adds
enrollment date and program fields to the output.
python cleanup-roster.py downloaded.xls -f enrollment program
--fields
option specifies a full list of fields to be printed. all
means all fields.
python cleanup-roster.py downloaded.xls --fields name netid
python cleanup-roster.py downloaded.xls --fields all
In Powershell, the following commands process multiple xls files.
# write all students in multiple files to single output file all.csv
py cleanup-roster.py $(dir *.xls) -o all.csv
# save student records in separate csv files
dir *.xls | foreach-object { py cleanup-roster.py $_ -o ($_.name -replace '.xls','.csv') }
Notes
Dec 2023
The downloaded files are actually HTML files and information is in an HTML table. Excel can open it, after a warning. The format of the file changed several times. Currently, the "Program and Plan" field is in a single table cell. For example,
<td >Engineering -
^MComputer Science</td>
Change the script to read downloaded file directly, instead of reading XLSX files.
Nov 2022
The format of downloaded files keeps changing, mainly in "Program and Plan" field. The format in Nov 2022 is:
<school> - <\r><major>
<school> - <\r><major>/<minor>
<school> - <\r><major>/<minor1/>/<minor2>
<school> - <\r><major>/<major2> as Second
<school> - <\r><major>/<major2> Second
The school can be :
Engineering
Liberal Arts & Sciences
CCS Non-Degree