Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
## A Python script that cleans up rosters downloaded from HuskyCT
This script reads downloaded roster files and writes selected columns/fields
into a CSV file. If the output filename is not specified, the lines are
printed to the standard output, which can be redirected into a file.
### Usage
Assume all files are under the current directory.
The following command prints info on the screen.
```
python cleanup-roster.py downloaded.xls
```
The `-o` option specifies an output file.
```
python cleanup-roster.py downloaded.xls -o section1.csv
```
Multiple `xls` files can be specified.
The script does not output all fields. The `-f` option specifies the additonal
fields, one or more, to be included. For example, the following command adds
enrollment date and program fields to the output.
python cleanup-roster.py downloaded.xls -f enrollment program
`--fields` option specifies a full list of fields to be printed. `all`
means all fields.
```
python cleanup-roster.py downloaded.xls --fields name netid
python cleanup-roster.py downloaded.xls --fields all
```
In Powershell, the following commands process multiple xls files.
```
# write all students in multiple files to single output file all.csv
py cleanup-roster.py $(dir *.xls) -o all.csv
# save student records in separate csv files
dir *.xls | foreach-object { py cleanup-roster.py $_ -o ($_.name -replace '.xls','.csv') }
```
## Notes
### Dec 2023
The downloaded files are actually HTML files and information is in an HTML
table. Excel can open it, after a warning. The format of the file changed
several times. Currently, the "Program and Plan" field is in a single table
cell. For example,
<td >Engineering -
^MComputer Science</td>
Change the script to read downloaded file directly, instead of reading XLSX
files.
### Nov 2022
The format of downloaded files keeps changing, mainly in "Program and Plan" field.
The format in Nov 2022 is:
<school> - <\r><major>
<school> - <\r><major>/<minor>
<school> - <\r><major>/<minor1/>/<minor2>
<school> - <\r><major>/<major2> as Second
<school> - <\r><major>/<major2> Second
The school can be :
Engineering
Liberal Arts & Sciences
CCS Non-Degree