Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
cleanup-roster/readme.MD
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
79 lines (54 sloc)
2.18 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## A Python script that cleans up rosters downloaded from HuskyCT | |
This script reads downloaded roster files and writes selected columns/fields | |
into a CSV file. If the output filename is not specified, the lines are | |
printed to the standard output, which can be redirected into a file. | |
### Usage | |
Assume all files are under the current directory. | |
The following command prints info on the screen. | |
``` | |
python cleanup-roster.py downloaded.xls | |
``` | |
The `-o` option specifies an output file. | |
``` | |
python cleanup-roster.py downloaded.xls -o section1.csv | |
``` | |
Multiple `xls` files can be specified. | |
The script does not output all fields. The `-f` option specifies the additonal | |
fields, one or more, to be included. For example, the following command adds | |
enrollment date and program fields to the output. | |
python cleanup-roster.py downloaded.xls -f enrollment program | |
`--fields` option specifies a full list of fields to be printed. `all` | |
means all fields. | |
``` | |
python cleanup-roster.py downloaded.xls --fields name netid | |
python cleanup-roster.py downloaded.xls --fields all | |
``` | |
In Powershell, the following commands process multiple xls files. | |
``` | |
# write all students in multiple files to single output file all.csv | |
py cleanup-roster.py $(dir *.xls) -o all.csv | |
# save student records in separate csv files | |
dir *.xls | foreach-object { py cleanup-roster.py $_ -o ($_.name -replace '.xls','.csv') } | |
``` | |
## Notes | |
### Dec 2023 | |
The downloaded files are actually HTML files and information is in an HTML | |
table. Excel can open it, after a warning. The format of the file changed | |
several times. Currently, the "Program and Plan" field is in a single table | |
cell. For example, | |
<td >Engineering - | |
^MComputer Science</td> | |
Change the script to read downloaded file directly, instead of reading XLSX | |
files. | |
### Nov 2022 | |
The format of downloaded files keeps changing, mainly in "Program and Plan" field. | |
The format in Nov 2022 is: | |
<school> - <\r><major> | |
<school> - <\r><major>/<minor> | |
<school> - <\r><major>/<minor1/>/<minor2> | |
<school> - <\r><major>/<major2> as Second | |
<school> - <\r><major>/<major2> Second | |
The school can be : | |
Engineering | |
Liberal Arts & Sciences | |
CCS Non-Degree |