Top Notebook on League of Legends How to Win: Critique

Introduction

As someone who was looking for gaming data and specifically League of Legends data, this notebook (a place to write code) above is one of the first analysis you will find on Kaggle (a popular website to find data and analysis on it). Since it is one of the first notebooks found, many new data scientists will look to this notebook for guidance and around League of Legends data; Therefore, I think it's important to reflect on prominent notebooks to improve them as well as notice what is done well.


Notebook Overview

That data itself is hosts various team statistics from 10,000 ranked League of Legends games, one of which telling the team that won. The analysis attempts to answer the question "which features are more correlated with winning?" The author starts out with an Exploratory Data Analysis (EDA), where they explain the various columns of data that they collected; This is done through various data visualizations. Next, they clean the data by determining which columns leak information about each other and removing one of them. Finally, they implement various machine learning algorithms to use the data to predict which team won.


Positives

Let's start with the good parts of this notebook (from the perspective of how I would have done it). First of all, I liked the inclusion of an introduction; Whenever you make an analysis, especially a public one, it should have an introduction of the dataset. This functions to give anyone reading an understanding of your data so they can understand how you interpret it better later. Next is the cleaning section, which shows a decent way of how to clean data for a model. Co-linearity is an important factor to look for which many new data scientists might not think of.


Negatives

Now, for the parts that need improvement. While having the introduction was good, it was definitely lacking in some aspects. First of all, there were no explanations of what each column meant. So while we knew where the data came from, we don't actually know what each column of the data means. Another item that was missing was the explanation of what the notebook was actually doing. While I mentioned the research question earlier, the actual notebook doesn't explain anywhere what features are more correlated with winning. To continue with the lack of explanations, throught the entirety of the notebook there are around two or three one sentence explanations of what is going on. For a notebook of this size, that is completely unacceptable. It's vitaly important to provide explanation of why you do something so everyone who views your notebook doesn't have to sift through your work to understand why you do what you do; The reasoning is simply there. Likewise, if you needed to come back to the notebook later for whatver reason, it would be more accessible. Finally, and probably the most important missing part, is the answer to the question posed at the beginning. Nowhere does the author make a definitive statement using their analysis to tell us what features correlate the most with a win.


Conclusion

As someone who referenced this notebook when I started working with League of Legends data, I can confidently say that it's definitely somewhere to start. However, if I was the author of this notebook, I would go back and fix everything I mentioned above. It's a starting point, but not place to end.