THIS REPOSITORY HAS MOVED
Please visit the updated location of this repository for all future updates: https://github.com/tylernh10/tyler-hinrichs-ucsas-2024
tyler-hinrichs-ucsas-2024
If you want to start analyzing sports, you need data. Nowaways, there are many sources of pre-built datasets, but at times, you might have a need to make a custom dataset with data found online. Web scraping is the most effective solution to this problem. You can create automated scripts that can quickly and efficiently gather data from webpages. In doing so, you can create datasets specific to the questions that you want to be answered. During this workshop you will learn 1) what web scraping is, 2) how static web scraping works using Python packages pandas, requests, and BeautifulSoup, then 3) how dynamic web scraping works using Python package Selenium in conjunction with the previously learned packages.
Important information about each notebook:
static_soccer_data.ipynb:
- We use 3 Python libraries, Requests, BeautifulSoup4, and Pandas, which can be installed with commands in the notebook
dynamic_soccer_data.ipynb:
- We use 3 Python libraries, Selenium, BeautifulSoup4, and Pandas, which can be installed with commands in the notebook
- We must use a ChromeDriver for full functionality
- The version must match your downloaded Chrome instance
- Chrome version <=114>: https://chromedriver.chromium.org/downloads
- Chrome version >114: https://googlechromelabs.github.io/chrome-for-testing/
Slides:
- Have been created using rmarkdown
- Access through the .rmd file (need R to run) or through the html file