Welcome!
After attending a workshop provided by Mad About Sports in association with Rajasthan Royals, I became very much interested in the career path of a cricket data scientist/analyst.
As a result, I wanted to create my own space for sharing all my analysis, visualizations, and ML-based predictions for cricket (and occasionally football) matches/tournaments.
-
August 29, 2022
Meta-Learning for Full Match Prediction
In an earlier post, we attempted to predict the batting team’s score in the first innings and the chasing team’s win probability in the second. Thereafter, in another post, we tried to classify sequences of fixed length using LSTMs.
The objective in this post is to combine these ideas to some extent by building an LSTM that can be used to predict both the score and the win probability right from the start of each match by learning to handle sequences of variable length. Additionally, it is compared with linear approaches and to further boost performance, a stacking-based ensemble is implemented.
While testing the previous predictor during this year’s IPL, I noticed that it would always generate the same output in all games for a given situation. For example, 120/3 after 15 overs or 50 to win from 30 balls with 5 wickets left would produce the exact same result every single time.
This did not feel right since it does not take the sequence of events that lead to that situation into consideration. It is for that purpose that a recurrent neural network architecture is adopted in order to capture any “momentum shifts” throughout the match and build something similar to “Match Reel” by Cricket.com.
Match Reel (source: Cricket.com)
Read more... -
May 24, 2022
Road to the Playoffs - IPL 2022
With the league stage done and dusted, only four teams remain in the race for winning this year’s IPL and three of those are aiming to win their first title. Moreover, it is only the second time that just one from the preceding season’s top 4 has qualified for the playoffs whereas every other year (apart from 2016) has always had 2-3 from the previous set.
In my last post, we tried to identify the title contenders based on the previous season’s data. The goal in this one is to expand on that and use the information that we have from the start of the tournament up to this point (“the road to the playoffs”) to predict which team is most likely to win the big prize from here.
A few weeks ago, while I was going through the Wikipedia pages of past IPLs, I found the “Match summary” sections quite intriguing. It was a great way to quickly comprehend which teams did well at different stages thanks to the nice visual representations of the results. This made me wonder if the concepts from sequence classification using LSTMs could be applied to solve the problem in this post as well.
Match summary - IPL 2021 (source: Wikipedia)
Read more... -
March 26, 2022
Identifying Potential Champions - IPL 2022
In my last post, we looked at how linear machine learning models can be trained to effectively predict a team’s score and win probability simply based on a small set of features. I was able to extend the same logic to the IPL by just modifying the data source - Cricsheet also provides ball-by-ball data for every game in every IPL season from 2008 till date - and observed very similar performance levels during evaluation.
In this post, however, the main goal is slightly different. The idea is to try and identify the teams that have the best chance of winning this year’s tournament using data from the previous year, i.e., if \(x_i^t\) is the data associated with team \(i\) in season \(t\), then the goal is to compute the probability (\(p_i^{t+1}\)) of team \(i\) winning the title in season \(t+1\) such that:
\[\begin{aligned} p_i^{t+1} = f(x_i^t) \end{aligned}\]For example, the probability that defending champions CSK will win the IPL again in 2022 is calculated as a function of their data in 2021:
\[\begin{aligned} p_{csk}^{2022} = f(x_{csk}^{2021}) \end{aligned}\]The decision to use this type of mapping is based on the fact that there have been quite a few instances where a team finishing a season strongly has won the title in the following season. After MI dramatically qualified for the playoffs in 2014, they went on to win their 2nd title in 2015. Similarly, CSK despite having their worst ever season in 2020 still managed to win three in a row at the end and were able to carry that momentum into 2021 to secure their 4th title.
Having discovered such patterns, I was curious to see if my hypothesis that there is indeed a strong connection between performances in one season and the next was true for a larger set of teams and across multiple seasons. It was also a great opportunity to try and experiment with the popular XGBoost library.
Read more... -
February 27, 2022
Score and Win Predictor
After earning my cricket analytics certificate, I started applying some of the techniques I learned to analyze data from both the South Africa vs. India Test series and the Ashes in 2021/22. Through this experience, I was able to take a closer look at very intriguing aspects of the game such as a team’s reliance on a single batter, batters that have stepped up when their team is under pressure, effectiveness of teams with the ball based on how old it is, and so on.
Once I got a taste of data analytics in cricket thanks to the above experiments, I wanted to build something bigger. Drawing inspiration from popular prediction tools used in the industry such as ESPNcricinfo’s Forecaster and CricViz’s WinViz, I felt it would be cool to apply the AI & ML knowledge that I gained at university level to solve this problem on my own.
Sri Lanka batting first vs. India - 2nd T20I - 2022
Read more...