Machine Learning: Linear Regression
2021 USA Women's Volleyball - Linear regression
In the context of the 2021 Summer Season, data from the 2020 Tokyo Olympics and 2021 Volleyball Nations League was used in linear regression machine learning models to evaluate the importance of different features on team's attack quality. The models are hence more targeted to interpretation of the observed data rather than prediction
The work includes a data exploration section, that uses both descriptive statistics and visualization techniques, with a correlation analysis
It follows a data engineering and data preparation using a pipeline for the Machine Learning linear regression models
Results from linear regression models (linear regression, Ridge regression, Lasso regression, ElasticNet regression) are compared, including polynomial features
An 80-20 training-test split is used, with k-fold validation (10-folds) and GridSearchCV to find the appropriate regularization hyperparameter/s when appropriate
This work was developed with Python in a Jupyter Notebook, using Pandas and scikit-learn, with Plotly and Seaborn as visualization libraries
Scroll below to see process and results