THIS REPOSITORY HAS MOVED

Please visit the updated location of this repository for all future updates: https://github.com/tylernh10/tyler-hinrichs-ucsas-2024

tyler-hinrichs-ucsas-2024

If you want to start analyzing sports, you need data. Nowaways, there are many sources of pre-built datasets, but at times, you might have a need to make a custom dataset with data found online. Web scraping is the most effective solution to this problem. You can create automated scripts that can quickly and efficiently gather data from webpages. In doing so, you can create datasets specific to the questions that you want to be answered. During this workshop you will learn 1) what web scraping is, 2) how static web scraping works using Python packages pandas, requests, and BeautifulSoup, then 3) how dynamic web scraping works using Python package Selenium in conjunction with the previously learned packages.

Important information about each notebook:

static_soccer_data.ipynb:

We use 3 Python libraries, Requests, BeautifulSoup4, and Pandas, which can be installed with commands in the notebook

dynamic_soccer_data.ipynb:

We use 3 Python libraries, Selenium, BeautifulSoup4, and Pandas, which can be installed with commands in the notebook
We must use a ChromeDriver for full functionality
The version must match your downloaded Chrome instance
- Chrome version <=114>: https://chromedriver.chromium.org/downloads
- Chrome version >114: https://googlechromelabs.github.io/chrome-for-testing/

Slides:

Have been created using rmarkdown
Access through the .rmd file (need R to run) or through the html file

tnh19002/tyler-hinrichs-ucsas-2024

About

Resources

Stars

Watchers

Forks

Releases

Languages