Classification (ML)
2020 Imoco Conegliano W's Volleyball - Classification (ML)
Several machine learning classification algorithms were used to predict setter Asia Wolosz (Imoco Conegliano) choices in side-out based on the available information. The models target prediction, however, they will also learn for us which factors are the most important in driving her decisions
The data-set consists of data from the 2019/2020 season of Imoco Conegliano kindly provided by César Hernández González (Head Coach of Korea NT and Assistant Coach of Vakifbank Istanbul)
The work includes a data exploration section with a correlation analysis, and it uses descriptive statistics and visualization techniques. A data engineering and data preparation using a pipeline follows, in preparation for the classification models
Results from several classifier models (XGBoost, Random Forest, HistGradientBoosting, SVC, ExtraTrees, GradientBoosting, Logistic Regression, ADABoost, DecisionTree, K-Neighbors) are compared. The tuning procedure, both manual and using Hyperopt, is described
Model interpretation with Shapley values (using the SHAP library) is provided
This work was developed with Python in a Jupyter Notebook, using Pandas and scikit-learn, with Plotly and Seaborn as visualization libraries
Scroll below to see process and results