Machine Learning: Linear Regression

2021 USA Women's Volleyball - Linear regression

  • In the context of the 2021 Summer Season, data from the 2020 Tokyo Olympics and 2021 Volleyball Nations League was used in linear regression machine learning models to evaluate the importance of different features on team's attack quality. The models are hence more targeted to interpretation of the observed data rather than prediction

  • The work includes a data exploration section, that uses both descriptive statistics and visualization techniques, with a correlation analysis

  • It follows a data engineering and data preparation using a pipeline for the Machine Learning linear regression models

  • Results from linear regression models (linear regression, Ridge regression, Lasso regression, ElasticNet regression) are compared, including polynomial features

  • An 80-20 training-test split is used, with k-fold validation (10-folds) and GridSearchCV to find the appropriate regularization hyperparameter/s when appropriate

  • This work was developed with Python in a Jupyter Notebook, using Pandas and scikit-learn, with Plotly and Seaborn as visualization libraries

  • Scroll below to see process and results