Cricket Data Sources

A collection of cricket data sources.

Author

Hassan Rafique

Published

July 7, 2022

The age of sports analytics is here, entering cricket primarily through the shorter format of T20. Data is the fuel that drives the analytics engine. As I started looking for cricket data, it quickly became apparent that data is not as accessible as some of the other (American) sports such as baseball, basketball, American football, etc. Also, there is no single data source; ball-by-ball data, player profiles, and other player records are in different places.

Below I share some of the data sources I have encountered on my journey of discovering cricket data sources. These sources will provide casual fans with enough data in different formats (CSV, JSON, etc.) to analyze. However, if you are someone who wants to go a little further, build models, and regularly analyze data, it might be better to build your own database using the data sources below.

Cricsheet

Arguably the most accessible (downloadable) cricket data source, thanks to Stephen Rushe.

The data is available in multiple formats including (excel) CSV and JSON to download. You can find data for T20 leagues, T20 internationals (T20I), one day internationals (ODIs), and Test matches for both men and women.

Data type covered

  • ball-by-ball
  • match details
  • names and unique identifiers for all the people in the data

Many statistics, including averages, strike rate, and boundary percentage, can be calculated with ball-by-ball data. Also, ball-by-ball data is the most practical for those interested in building predictive models, and cricsheet provides that in the most accessible format of excel CSV for downloading.

For R programming users, {cricketdata} package lets you access the ball-by-ball and match detail data directly in R as a tibble (dataframe). Jacquie Tran wrote the fetch_cricsheet() function, and the example code which demonstrates it.

Tinniam Ganesh has a shiny app Googly Plus Plus and a R package {yorkr}.

(ESPN) Cricinfo

Most cricket fans are familiar with Cricinfo; for a long time, they were the only source for all things cricket. The most extensive repository of cricket data with the caveat that data is not in an accessible format to be downloaded easily. You would have to copy-paste (tables) or write programming scripts to access the data in a format suitable for analysis.

Statsguru is a search tool that lets you parse through their database and access the information you are looking for, usually in a table format.

For R programming users, {cricketdata} package lets you access the player and team data directly in R as a tibble (dataframe). Rob J Hyndman wrote a tutorial.

CricMetric

Cricmetric is another website that provides scorecards and a search tool based on player, venue, team, etc. Note that data is not in a downloadable format. However, you can copy-paste the search results presented in a table format into an excel file.

In particular, they provide the option for

  • Batsman vs Bowler matchup
  • Player Comparison

HowSTAT

Howstat is quite similar to Cricinfo in terms of what it provides; it is a big repository of cricket data with search tools to dig through it.

Tracking Data

Ball tracking data has been in use in cricket for a while now, but is not publicly available. Some folks have written code to web scrape whatever of the tracking data is available in some instances. There are two resources that I have come across for accessing tracking data

  • Python based
    • Data is raw and you would need some help making sense of all the features
    • There is an opportunity here to contribute to open-source tools for cricket data.
  • R based
    • Data is processed but less features than the python option above

Closing

I primarily work in R and plan to add more functionality to the R package {cricketdata}. It is probably worth building your own database, using the data sources mentioned above, for those interested in building predictive models and doing advanced analysis. I am working on building my cricket database, and it seems to be more work than I imagined at first. I hope to share it once it is ready.

Feel free to reach out, tweet @ me @dazzlytics, if you have any questions about accessing data or getting started with analysis, or share cricket data sources that you use more frequently.

Citation

BibTeX citation:
@online{rafique2022,
  author = {Hassan Rafique},
  title = {Cricket {Data} {Sources}},
  date = {2022-07-07},
  url = {https://dazzalytics.netlify.app/posts/cricket-data-sources},
  langid = {en}
}
For attribution, please cite this work as:
Hassan Rafique. 2022. “Cricket Data Sources.” July 7, 2022. https://dazzalytics.netlify.app/posts/cricket-data-sources.