Classification (ML)

2020 Imoco Conegliano W's Volleyball - Classification (ML)

  • Several machine learning classification algorithms were used to predict setter Asia Wolosz (Imoco Conegliano) choices in side-out based on the available information. The models target prediction, however, they will also learn for us which factors are the most important in driving her decisions

  • The data-set consists of data from the 2019/2020 season of Imoco Conegliano kindly provided by César Hernández González (Head Coach of Korea NT and Assistant Coach of Vakifbank Istanbul)

  • The work includes a data exploration section with a correlation analysis, and it uses descriptive statistics and visualization techniques. A data engineering and data preparation using a pipeline follows, in preparation for the classification models

  • Results from several classifier models (XGBoost, Random Forest, HistGradientBoosting, SVC, ExtraTrees, GradientBoosting, Logistic Regression, ADABoost, DecisionTree, K-Neighbors) are compared. The tuning procedure, both manual and using Hyperopt, is described

  • Model interpretation with Shapley values (using the SHAP library) is provided

  • This work was developed with Python in a Jupyter Notebook, using Pandas and scikit-learn, with Plotly and Seaborn as visualization libraries

  • Scroll below to see process and results