Tyler Bennett's Honors Conversion
This is a repository for my UConn Honors conversion project for CSE 3504: Probabilistic Performance Analysis of Computer Systems. Below, I will explain how to install and view this project and its results.
BUILD INSTRUCTIONS:
- Download the CDC's PLACES dataset from here
- Download the CDC's SVI County dataset from here
- Place both of these files into the same directory. Ensure the PLACES dataset is named "PLACES.csv" and the SVI dataset is named "SVICounty.csv"
- Download the Python file named "Tyler Bennett Honors Conversion Project.py" from this repo and place it in the same directory as the two CSV files
- In the Python file, change the "DIRECTORY" string to match the name of the directory with all three files in it
- Install any needed dependencies via Pip (dependencies are listed at the top of the .py file)
- Run the Python file
RESULTS:
A summarized version of the results has been included in a graph titled "Feature Importance.png". This graph shows how much each of the 16 explanatory variables (from SVI) contributes to the prediction of depression among adults (using the "mean decrease in impurity" metric). With some minor changes, we could also modify this to track other mental health outcomes, but this is outside the scope of this project.
Furthermore, the random forest model generated using this data had a model score of ~0.6258 and a mean squared error of 4.133.
REFERENCES:
- https://pandas.pydata.org/pandas-docs/version/1.5/index.html
- https://realpython.com/pandas-dataframe/
- https://github.com/afrozchakure/Internity-Summer-Internship-Work/blob/master/Blogs/Random_Forest_Classification/Random%20Forest%20Classifcation.ipynb
- https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html