Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

THIS REPOSITORY HAS MOVED

Please visit the updated location of this repository for all future updates: https://github.com/tylernh10/tyler-hinrichs-ucsas-2024

tyler-hinrichs-ucsas-2024

If you want to start analyzing sports, you need data. Nowaways, there are many sources of pre-built datasets, but at times, you might have a need to make a custom dataset with data found online. Web scraping is the most effective solution to this problem. You can create automated scripts that can quickly and efficiently gather data from webpages. In doing so, you can create datasets specific to the questions that you want to be answered. During this workshop you will learn 1) what web scraping is, 2) how static web scraping works using Python packages pandas, requests, and BeautifulSoup, then 3) how dynamic web scraping works using Python package Selenium in conjunction with the previously learned packages.

Important information about each notebook:

static_soccer_data.ipynb:

  • We use 3 Python libraries, Requests, BeautifulSoup4, and Pandas, which can be installed with commands in the notebook

dynamic_soccer_data.ipynb:

Slides:

  • Have been created using rmarkdown
  • Access through the .rmd file (need R to run) or through the html file