Fall 2023 Seminars

December 4, 2023

Zoe Macris (3:35 pm @ AUST 247)

Risk Analysis for Cyclists at Roundabouts

Cycling is becoming an increasingly common transport choice for commuters and to complete daily tasks. This form of transportation is important to reduce urban traffic, increase public health, decrease greenhouse gas emissions, and improve air quality. As a result, the safety of cyclists travelling through urban areas must be evaluated. Fatalities of cyclists can often be attributed to crashes with automobiles at urban intersections. We will explore the risk analysis process and results for a study on cycling in roundabouts. This involves combining a probabilistic model based on Poisson’s law, and a damage model which focuses on reaction time of road users, to analyze collisions between motor vehicles and bicycles. I will show how this research can impact the field of road safety for cyclists, and share my own thoughts and potential improvements on it.

Cantisani, G., Durastanti, C., & Moretti, L. (2021). Cyclists at Roundabouts: Risk Analysis and Rational Criteria for Choosing Safer Layouts. In Infrastructures (Vol. 6, Issue 3, p. 34). MDPI AG. https://doi.org/10.3390/infrastructures6030034

Aven, T. (2016). Risk assessment and risk management: Review of recent advances on their foundation. In European Journal of Operational Research (Vol. 253, Issue 1, pp. 1–13). Elsevier BV. https://doi.org/10.1016/j.ejor.2015.12.023

Sean Murphy (4:00 pm @ AUST 247)

How does Video Assistant Referee (VAR) modify the game in elite soccer?

Any sports fan knows and likely has personal experience of how influential officiating can be on the outcome of a game. Biased or poor referees can sway the flow of a game in negative ways, ways that have nothing to do with the skill level or performance of either participating team. As a result, there have been many efforts to improve officiating quality with the use of modern technological advances, and some of these efforts have made their way to the most popular professional leagues. The Video Assistant Referee (VAR) is one such technology that was implemented in professional soccer leagues broadly in the 2017-2018 season with the hope of reducing officiating bias and minimizing erroneous calls. Since its implementation and propagation, there have been conflicting responses, with some arguing that it improves the quality of the game while others argue that it is unhelpful and unnecessarily interrupts the game tempo. In my presentation, I will present a paper that seeks to statistically analyze the effects of VAR on professional soccer. The paper analyzes data from two prominent leagues: the Italian Serie A and the German Bundesliga, from both before and after the dawn of VAR. For my part, I will give background on VAR, discuss the analytical methodology used in the paper, outline the findings, and give some discussion of my own. I will also recommend further areas of study branching off from the findings of this analysis.

Source: Lago-Peñas Carlos, Rey Ezequiel & Kalén Anton (2019) How does Video Assistant Referee (VAR) modify the game in elite soccer?, International Journal of Performance Analysis in Sport, 19:4, 646-653, DOI: 10.1080/24748668.2019.1646521 https://doi.org/10.1080/24748668.2019.1646521

Hao Ding (4:40 pm @ AUST 247)

Electronic Sports: Winner Prediction

This presentation explores factors influencing player success in the popular online multiplayer battle royale game, PlayerUnknown’s Battlegrounds (PUBG). Utilizing data from the PUBG Developer API, which includes information from over 65,000 recorded games, we employ correlation analysis, feature importance from tree-based models, and recursive feature elimination to identify key elements impacting player performance. The dataset encompasses various metrics such as assists, boosts, revives, damage dealt, team kills, kill place, distance traveled, weapons acquired, match duration, and win place percentile. Our analysis reveals correlations between different factors and highlights the significance of kills and walk distance in achieving better win placements. The methods section details the approach, including the use of ensemble models like Random Forest and Gradient Boosting, as well as deep learning models for predictive tasks. The discussion emphasizes the need for a large and diverse dataset, periodic model updates due to game dynamics, collaboration with PUBG experts, and the inherent unpredictability of player strategies. While machine learning models offer valuable insights, they should be interpreted cautiously, recognizing the game’s dynamic nature and unpredictable elements.

November 27, 2023

Ge Li (3:35 pm @ AUST 247)

Unveiling the Mechanics of GPTs and the GPT Store

This presentation delves into the technical intricacies of Generative Pre-trained Transformers (GPTs), particularly the latest GPT-4 model, and the GPT Store. GPTs, at their core, employ advanced neural network architectures capable of processing and generating human-like text. We will explore key elements such as transformer architecture and language modeling, which enable these AI models to understand context and generate responses. Additionally, the session will introduce the GPT Store, highlighting how it allows users to access and employ GPT-based applications with ease. This platform exemplifies the practical application of GPTs in various domains, democratizing access to sophisticated AI tools. Attendees will gain insights into the foundational technologies behind GPTs and the operational dynamics of the GPT Store, offering a glimpse into the future of AI applications.

Garrick Ho (4:00 pm @ AUST 247)

An energy matching method for battery electric vehicle and hydrogen fuel cell vehicle based on source energy consumption rate

In this study on smart city development, the focus is on enhancing energy efficiency and promoting renewable energy, particularly in the context of new energy vehicles (NEVs). The proposed Source to Range (STR) model is introduced to analyze the energy system configuration and efficiency balance of NEVs, including battery electric vehicles (BEVs) and hydrogen fuel cell vehicles (HFCEVs). The study utilizes an energy efficiency analysis chart to visually represent the conversion, delivery, and consumption of energy throughout the vehicle's life cycle. The Source Energy Consumption Rate (SECR) is introduced as a metric to evaluate vehicle energy efficiency. The results indicate that the STR model is an effective tool for energy matching and analysis of NEVs, providing valuable insights for the development of these vehicles and offering constraints for energy system design in comparison to equivalent fuel vehicles. In my presentation, I will be going through how the STR model was made and the results from the model. And at the end, conclude it with some future works that can build off of this study.

Huiyuan Xiong, Huan Liu, Ronghui Zhang, Limin Yu, Zhijian Zong, Minghui Zhang, Zhu Li, An energy matching method for battery electric vehicle and hydrogen fuel cell vehicle based on source energy consumption rate, International Journal of Hydrogen Energy, Volume 44, Issue 56, 2019, Pages 29733-29742, ISSN 0360-3199, https://doi.org/10.1016/j.ijhydene.2019.02.169.

Thomas Lin (4:40 pm @ AUST 247)

Essays on education and income

Examine the intricate correlation between education and income through rigorous statistical methods, extracting valuable insights from meticulously curated datasets. Systematically categorize education levels and scrutinize income distributions to unveil discernible patterns and correlations, prioritizing a professional and precise approach. Commence with examining data sources to lay the groundwork for a systematic exploration, offering a succinct yet thorough perspective on the interplay between education and income.

-Tinbergen, J. "THE IMPACT OF EDUCATION ON INCOME DISTRIBUTION." Review of Income and Wealth, vol. 18, 1972, pp. 255-265. Wiley, https://doi.org/10.1111/j.1475-4991.1972.tb00865.x.

-Bobbitt-Zeher, D. "The Gender Income Gap and the Role of Education." Sociology of Education, vol. 80, no. 1, 2007, pp. 1-22. SAGE, https://doi.org/10.1177/003804070708000101.

-Stryzhak, O. "The relationship between education, income, economic freedom and happiness." The International Conference on History, Theory and Methodology of Learning (ICHTML 2020), vol. 75, 2020, https://doi.org/10.1051/shsconf/20207503004.

Sneha Shetty (5:10 pm @ AUST 247)

Social Media Analytics and Its Impact on Small Businesses

Social media has become a prominent part of society especially with its users growing exponentially over the past decade. According to Forbes, there are around 4.9 billion social media users globally currently with the average person spending a little under 2.5 hours on social media platforms per day. While people can argue whether its effects are positive or negative, one thing for sure is it has given exposure to different brands, locations, restaurants, knowledge about niche subjects and so much more. For my presentation, I would like to explore the subject of social media analytics, what type of data is collected, and how that data allows one's social media presence to grow. After doing so, I will tie in the usage of social media analytics for the growth of small businesses.

“Complete Guide to Social Media Analytics and Why It’s Important.” Sprout Social, 26 Oct. 2023, https://sproutsocial.com/insights/social-media-analytics/#:~:text=Social%20media%20analytics%20refers%20to,and%20adapt%20their%20strategy%20accordingly.

Shepard, Maddie. “Small-Business Marketing Statistics and Trends.” NerdWallet, 17 Mar. 2021, https://www.nerdwallet.com/article/small-business/marketing-statistics-for-small-business.

Wong, Bella. “Top Social Media Statistics and Trends of 2023.” Edited by Cassie Bottorff, Forbes, Forbes Magazine, 7 Aug. 2023, https://www.forbes.com/advisor/business/social-media-statistics/#source.

November 13, 2023

Richa Patel (3:35 pm @ AUST 247)

The Effect of Airbnb on Hotel Performance: Comparing Single- and Multi-Unit Host Listings in the United States

Since the evolution of Airbnb, a global platform with over 7 million listings, highlighting its significant impact on the accommodations sector. With the increasing number of professional hosts handling numerous listings and their contribution to 69% of total revenue, Airbnb is undergoing a substantial transition. In this presentation I would examines the disparities between single-unit and multi-unit hosts on Airbnb, revealing that while single-unit listings have grown exponentially, multi-unit listings have outpaced them, challenging traditional perceptions of Airbnb's impact on the lodging industry and its performance.

References:

Dogru, T., Mody, M., Line, N., Hanks, L., Suess, C., & Bonn, M. (2022). The Effect of Airbnb on Hotel Performance: Comparing Single- and Multi-Unit Host Listings in the United States. Cornell Hospitality Quarterly, 63(3), 297-312. https://doi.org/10.1177/1938965521993083

Alexander Rice (4:00 pm @ AUST 247)

Health Impact of Tafamidis in Transthyretin Amyloid Cardiomyopathy Patients

Transthyretin amyloid cardiomyopathy (ATTR-CM) is an underdiagnosed and severe variety of heart disease that makes it harder for the heart to pump blood due to excess protein buildup in the chambers of the heart. Since detection of this disease has become more accurate in recent years due to advancing medical technology, treatment has been developed in the form of tafamidis, which slows down or stops the protein buildup inside the heart. In this presentation, I will review the paper “Health impact of tafamidis in transthyretin amyloid cardiomyopathy patients: an analysis from the Tafamidis in Transthyretin Cardiomyopathy Clinical Trial (ATTR-ACT) and the open-label long-term extension studies”. In this analysis, a multi-state, cohort, markov model was constructed simulating the disease course of ATTR-CM. I will explore this model, as well as key findings from this study, how the new treatment affected patients' survival years as well as quality of life in this time frame. Lastly, I will share my thoughts on where potential research opportunities lie in the future on this topic.

Reference: Mark H Rozenbaum, Andrea Garcia, Daniel Grima, Diana Tran, Rahul Bhambri, Michelle Stewart, Benjamin Li, Bart Heeg, Maarten Postma, Ahmad Masri, Health impact of tafamidis in transthyretin amyloid cardiomyopathy patients: an analysis from the Tafamidis in Transthyretin Cardiomyopathy Clinical Trial (ATTR-ACT) and the open-label long-term extension studies, European Heart Journal - Quality of Care and Clinical Outcomes, Volume 8, Issue 5, September 2022, Pages 529–538, https://doi.org/10.1093/ehjqcco/qcab031

Miles Kee (4:40 pm @ AUST 247)

The Development and Current State of Basketball Analytics

While the beginning of the relatively recent sports analytics revolution was seen with the Oakland Athletics’ ”Moneyball” teams, basketball has been the sport of analytical focus as of late. The single best example of the modern basketball analytics revolution was the 2018 Houston Rockets, a team that relied almost entirely on the three point shot and shots close to the hoop. People around the game of basketball have always had a fondness for the art of the midrange jump shot, yet the Rockets and many other modern teams have had great success by essentially eliminating that shot from their arsenal. The ”bible” of the baseball analytics movement is Bill James’ ”Baseball Abstract,” first published in 1985. The basketball equivalent of these publications is Dean Oliver’s book titled ”Basketball on Paper,” first released in 2004. This book impacted the way players, fans, and coaches look at the game of basketball; and laid the groundwork for teams like the 2018 Rockets. In my presentation, I will discuss the content of Oliver’s book and its impact on the NBA. I will also show how it has impacted college basketball and discuss the current state of college basketball analytics, supplemented by some of my own analytical work I have done for UConn basketball.

November 6, 2023

Luke Noel (3:35 pm @ AUST 247)

Unsupervised Methods for Identifying Pass Coverage Using NFL Player Tracking Data

The National Football League's ever-evolving landscape demands cutting-edge data analytics techniques for evaluating player performance. I will analyze the paper, “Unsupervised Methods for Identifying Pass Coverage Among Defensive Backs with NFL Player Tracking Data,” where the authors explore statistical techniques to predict whether a cornerback was playing man or zone coverage on a play. They do this with the help of detailed player and ball tracking data, which marks the locations and trajectories (speed, angle) of all 22 players on the field (and the ball) at a rate of 10 Hz. I will discuss some background on the topic, the methods/findings, and my thoughts on how this process can be implemented in the future.

Reference: Dutta, Rishav, Ronald Yurko, and Samuel L. Ventura. "Unsupervised methods for identifying pass coverage among defensive backs with NFL player tracking data." Journal of Quantitative Analysis in Sports 16, no. 2 (2020): 143-161.

Lai Jiang (4:00 pm @ AUST 247)

Machine Learning for Handwriting Recognition: Algorithms and Techniques

Machine learning always has a connection to the Computer Science and Statistics, and it’s contributing gigantically to our lives. In this presentation I’m going to talk about the Handwriting recognition in machine learning field, or, more specifically, the deep learning field. The presentation will be divided into four parts. The Introduction of this concept, what is Handwriting recognition. What can we use with it; the fundamental concepts about it, how does it work? I’ll use a single number example to express it, including the dimensions, models, probability, and activation function. The third, I’m going to implement how can we perform such behavior, like introducing packages and libraries, how to write codes using python, and for the last, I may also want to introduce some fault, disadvantages with the technology we’re currently enjoying to. Three articles were cited to improvise my essay.

References:

Plamondon, R., & Srihari, S. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63–84. https://doi.org/10.1109/34.824821

Cortes, Corinna, et al. "Advances in neural information processing systems 28." Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015.

Huyen Nguyen (4:35 pm @ AUST 247)

Hidden Markov Models in Biological Sequence Analysis

A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events that depend on a Markov process with unobservable states. It has applications in many fields such as signal processing, pattern recognition, economics, finance, and bioinformatics. In this talk, I will discuss the basic problems and algorithms for HMMs including the scoring problem and forward algorithm, the decoding problem and Viberti algorithm, and the training problem and the Baum-Welch algorithm. I will also present an example of HMM in the context of biological sequence analysis.

References:

Borodovsky, M., & Ekisheva, S. (2006). Problems and solutions in biological sequence analysis. Cambridge University Press.

Compeau, P., & Pevzner, P. (2018). Bioinformatics algorithms: an active learning approach. Active Learning Publishers.

Yoon, B. J. (2009). Hidden Markov models and their applications in biological sequence analysis. Current genomics, 10(6), 402-415.

October 30, 2023

Shuiyi Hu (3:35 pm @ AUST 247)

The Fundamental Structure of Statistical Papers

Statistical papers play a pivotal role in disseminating research findings, facilitating data-driven decision-making, and advancing our understanding of complex phenomena. This presentation explores the essential sections that constitute the statistical papers. I will introduce the key components, from the title, and abstract to discussion and conclusion and provide structure of a paper as an example.

Reference: Elizabeth Schifano, and Jun Yan. 2023. Scientific Writing. https://statds.github.io/stat-writing/index.html.

Ashley Merritt (4:00 pm @ AUST 247)

Tracking the Flu Pandemic by Monitoring the Social Web

Every year, the arrival of “Flu Season” brings forth concerns about the spread of influenza, prompting public health authorities to devise different methods for early detection and containment. In this presentation I will be reviewing a study called, “Tracking the Flu Pandemic by Monitoring the Social Web,” written by Lampos and Cristianini in 2010. This research uses Twitter in order to proactively monitor and track the spread of the flu. In this study, tweets are identified by symptom-related statements which are then transformed into a flu-score. This score will serve as a real-time indicator of the prevalence of flu symptoms in a given population. During this presentation, I will discuss the statistical correlations found between the generated flu-score and data from the Health Protection Agency, in which we will discover the value of using social media for pandemic surveillance. I will finally explore the key conclusions drawn from the study and consider potential avenues for further research.

Reference: V. Lampos and N. Cristianini, "Tracking the flu pandemic by monitoring the social web," 2010 2nd International Workshop on Cognitive Information Processing, Elba, Italy, 2010, pp. 411-416, https://ieeexplore.ieee.org/abstract/document/5604088.

Okem Chime (4:40 pm @ AUST 247)

Analyzing Performance Data in Sport: A Statistical Perspective

Everyone knows that professional athletes and sports teams are always trying to find a way to gain the edge on their opponents. The role of performance data in sport has become increasingly important as teams and players use data to gain insights into sporting performance and target areas for improvement. With the rise of technology such as wearable motion tracking, we now have more access to performance data than ever before. This presentation will look at how sporting data can be used from a statistical standpoint as a tool to inform decisions and potentially help athletes perform at their best.

References:

  • Barris, Sian, and Chris Button. "A review of vision-based motion analysis in sport." Sports Medicine 38 (2008): 1025-1043.
  • Adesida, Y., Papi, E., & McGregor, A. H. (2019). Exploring the role of wearable technology in sport kinematics and kinetics: A systematic review. Sensors, 19(7), 1597.
  • Gryko, K., Mikołajec, K., Maszczyk, A., Cao, R., & Adamczyk, J. G. (2018). Structural analysis of shooting performance in elite basketball players during FIBA EuroBasket 2015. International Journal of Performance Analysis in Sport, 18(2), 380-392.
  • Russell, M., Rees, G., & Kingsley, M. I. (2013). Technical demands of soccer match play in the English championship. The Journal of Strength & Conditioning Research, 27(10), 2869-2873.
  • Perin, C., Vuillemot, R., Stolper, C. D., Stasko, J. T., Wood, J., & Carpendale, S. (2018, June). State of the art of sports data visualization. In Computer Graphics Forum (Vol. 37, No. 3, pp. 663-686).
  • https://www.premierleague.com
  • Arnold, J. F., & Sade, R. M. (2017). Wearable technologies in collegiate sports: the ethics of collecting biometric data from student-athletes. The American Journal of Bioethics, 17(1), 67-70.
  • Ahsan, M., Ahmed, M. D., & Azeem, K. (2023). Role of predictive modeling and personalized modeling in the enhancement of athletic performance. Saudi Journal of Sports Medicine, 23(1), 7.
  • Blanchfield, J. E., Hargroves, M. T., Keith, P. J., Lansing, M. C., Nordin, L. H., Palmer, R. C., … & Napoli, N. J. (2019, April). Developing Predictive Athletic Performance Models for Informative Training Regimens. In 2019 Systems and Information Engineering Design Symposium (SIEDS) (pp. 1-6). IEEE.

October 23, 2023

Delia Lin (3:35 pm @ AUST 247)

How Critical are Critical Reviews? Film Critics' Influence on Box Office Performance.

Critics have a major influence in consumers’ decisions in many industries, but their impact is especially noticeable in the film industry. Over a third of Americans actively look for film critics' opinions (The Wall Street Journal, 2001), and about one in three filmgoers pick movies based on positive reviews. In this presentation, I will introduce three issues related to the effects of film critics on box office success. The first issue is critics’ role in affecting box office performances either as an influencer, if they actively influence the decisions of consumers in the early weeks of a run, or predictor, if they merely predict consumers’ decisions. The second issue is whether positive and negative reviews have comparable effects on box office performance, and the third section of this investigation examines how the influence of critical reviews on box office success is moderated by factors of star power and movie budgets.

Reference: Basuroy, Suman, et al. “How critical are critical reviews? the box office effects of film critics, Star Power, and budgets.” Journal of Marketing, vol. 67, no. 4, 2003, pp. 103–117, https://doi.org/10.1509/jmkg.67.4.103.18692.

Chris Truedson (4:00 pm @ AUST 247)

Transformers in Data Science: A High-Level Overview (The T in ChatGPT)

Transformers are a revolutionary technology pioneered by Google in 2017 and brought mainstream through the public release of ChatGPT in 2022. They have reshaped the landscape of data science and revolutionized AI ushering us into the age of generative AI. Transformers are a class of machine learning models that excel at handling sequential data which makes them invaluable in a number of areas like natural language processing, image recognition, and many more. In attempting to understand the ideas behind transformers, characterized by their attention mechanisms, we'll see how by allowing data scientists to process information in parallel this has led to remarkable breakthroughs in various tasks like language translation and sentiment analysis to name a few. Finally, we'll also come to appreciate the many practical applications of transformers through improving chatbots, search engines, recommendation systems, etc.

References:

  1. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
  2. Doshi, Ketan. 2021. “Transformers Explained Visually (Part 1): Overview of Functionality.” Medium. June 3, 2021. https://towardsdatascience.com/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452#:~:text=The%20Transformer.
  3. Hugging Face. 2022. Review of The Hugging Face NLP Course, 2022. Hugging Face. 2022. https://huggingface.co/learn/nlp-course/chapter1/1?fw=pt.
  4. Khan, Salman, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. “Transformers in Vision: A Survey.” ACM Computing Surveys, January. https://doi.org/10.1145/3505244.

Kathleen Houlihan (4:40 pm @ AUST 247)

Large Scale Machine Learning System behind X

One of the most popular social media platforms to date is X (formally known as twitter) which has roughly 556 million monthly active users. X is a social media platform where users can post short text messages, videos, images, animated GIFs, polls, and more for the world to see. The platform consists of a “following” timeline where you can view a stream of posts from accounts you have chosen to follow and a “for you” timeline which displays suggested content that X thinks you may like. Machine learning is what enables X to surface content most relevant to each user, drive engagement, and promote healthy conversations on the social media platform. In my presentation I will be discussing the large scale machine learning systems that drive X’s performance, how X’s algorithmic amplification plays a role in what individuals see on the app, and policies X has instituted to protect its users and advance AI for twitter in an ethical way.

References:

Huszár, F., Ktena, S. I., O’Brien, C., Belli, L., Schlaikjer, A., & Hardt, M. (2021). Algorithmic amplification of politics on Twitter. Proceedings of the National Academy of Sciences, 119(1). https://doi.org/10.1073/pnas.2025334119

Lin, J., & Kolcz, A. (2012). Large-scale machine learning at Twitter. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2213836.2213958

Pennacchiotti, M., & Popescu, A.-M. (2021). A machine learning approach to Twitter User Classification. Proceedings of the International AAAI Conference on Web and Social Media, 5(1), 281–288. https://doi.org/10.1609/icwsm.v5i1.14139

Twitter. (n.d.). Twitter meets TensorFlow. Twitter. https://blog.twitter.com/engineering/en_us/topics/insights/2018/twittertensorflow

October 16, 2023

Boyoun Chung (3:35 pm @ AUST 247)

Analysis of Binary Outcomes with Missing Data Based on Deterministic Imputation Methods

"Analysis of binary outcomes with missing data: missing = smoking, last observation carried forward, and a little multiple imputation" written by Donald Hedeker, Robin J. Mermelstein, Hakan Demirtas discusses a challenging problem inherent in binary outcomes with missing data within substance abuse studies. This paper talks about this problem in a simple two-group design where the interest centers on comparing the groups in terms of the binary outcome at a single time point. This paper describes how the deterministic assumptions of missing = smoking and last observation carried forward (LOCF) can be relaxed by allowing missingness to be related imperfectly to the binary outcome, either stratified on past values of the outcome or not. In addition, this paper suggests a sensitivity analysis to investigate how the analysis result is affected by the underlying assumption of associations between nonresponse and smoking behavior among subjects who did not respond. As an illustrative example, this study analyzes a data set from a published smoking cessation study evaluating the effectiveness of adding group-based treatment adjuncts to an intervention comprised of a television program and self-help materials.

References:

Hedeker, D., Mermelstein, R. J., & Demirtas, H. (2007). Analysis of binary outcomes with missing data: missing= smoking, last observation carried forward, and a little multiple imputation. Addiction, 102(10), 1564-1573.

Gruder, C. L., Mermelstein, R. J., Kirkendol, S., Hedeker, D., Wong, S. C., Schreckengost, J., … & Miller, T. Q. (1993). Effects of social support and relapse prevention training as adjuncts to a televised smoking-cessation intervention. Journal of consulting and clinical psychology, 61(1), 113.

Shiyi Peng (4:00 pm @ AUST 247)

Fairness in Recommender Systems

Recommendation system (RS) is one of the most obvious and successful applications of artificial intelligence technology in practice. Its personalized recommendations on shopping websites affect the interests of consumers and enterprises. On the other hand, personalized recommendations on social media (TikTok) also determine what information people can obtain from the Internet, thereby affecting the entire online environment. Due to the increasing potential impact of recommendation systems on individuals, organizations, and society, the issue of fairness has received increasing attention in recent years. I will introduce the basic concepts of RS and the concept of fairness, and then introduce several unfair phenomena and the reasons for unfairness. For example, data bias and bias in the machine learning model itself (protected characteristics).

Based on: Deldjoo, Yashar, et al. "A survey of research on fair recommender systems." arXiv preprint arXiv:2205.11127 (2022).

Justin Chan (4:40 pm @ AUST 247)

Where AI is today, its limitations and future implications

By now we’ve all heard of ChatGPT and how amazing it is or can be. But is it really that amazing? What can it actually do? Maybe you've seen former President Trump and current President Biden play Minecraft with each other but is that all there is to it? Where does it lack? And where is it going in the future? In my presentation next week, I’ll do my best to answer all of those questions in a clear and concise way and by the end of it I hope to give you all a solid understanding of what AI is, what AI isn’t (at least for now), and some tips on how to use it better.

October 2, 2023

Ajay Natarajan (3:35 pm @ AUST 247)

nflWAR and Its Potential Applications Within the National Football League

Data analytics are a burgeoning field within the sports industry, where teams have transitioned from more traditional methods to using data to guide decision making, roster construction, recruiting, player evaluation, and more. Particularly on the player evaluation front, some sports have made further advances than others, especially baseball, where a common and publicly available stat, Wins Above Replacement (WAR), has revolutionized analysis. However, in American football, analysts have not made similar strides as quickly, and many existing methods are often proprietary, not reproducible, or lag behind other sports. I will be analyzing a study called “nflWAR: a Reproducible Method for Offensive Player Evaluation in Football”, published in the Journal of Quantitative Analysis in Sports by Yurko, Ventura, and Horowitz (2019) that has shown their methodology for “nflWAR” as a player evaluation statistic. I will first introduce the statistic and its relevance as well as previous methods, and then delve into its methodology and its applications to players, before bringing up several case studies from prior seasons. I will look at how well the model fits and its uncertainty. Finally, I will go over the statistic’s potential extensions to the game of football, and input my own conclusions and criticisms of the statistic and the study overall.

Xiaoshu Wang (4:40 pm @ AUST 247)

Monte Carlo Simulation and Its Application in the Math and Finance World

Monte Carlo Method is a very powerful and common approach in the math and statistics field. It is a broad class of computation algorithms that use repeated random sampling to solve complex problems and obtain numerical results. 'The underlying concept is to use randomness to solve problems that might be deterministic in principle' (Wikipedia). In my presentation, I will introduce the basic idea of Monte Carlo Simulation and give two examples of its application in the math and finance field. Also, I will introduce the pros and cons of using it in different scenarios.

September 25, 2023

Mathew Chandy (3:35 pm @ AUST 247)

ChatGPT for Data Science

ChatGPT is a popular chatbot developed by OpenAI. Users can hold conversations with ChatGPT and ask for answers to complex questions. ChatGPT is particularly useful for implementing algorithms. In this presentation, I will discuss what queries can be submitted to ChatGPT for the purposes of a Data Science project. I will introduce a dataset, and for each step of a Data Science project - such as data cleaning, exploratory data analysis, and modeling - I will present a question that one can ask and the resulting code in an RMarkdown file. I will also show how ChatGPT can be used to check for mistakes in already written code. I will emphasize the importance of being specific when writing queries, and I will reinforce the understanding that ChatGPT may not always produce accurate responses, and it is the responsibility of the user to ensure the correctness of the code used.