Benjamin Disraeli said ‘There are three kinds of lies: lies, damned lies, and statistics’. What about rugby statistics?
As most of you will be aware I publish a set of statistics for games involving the Wallabies and most Tri Nations games. Recently I’ve received a number of queries asking why there are differences between statistics providers.
The most recent example was in the Tri Nations game between the Springboks and Wallabies in Durban where people based criticism of the work rate of Scott Higginbotham on published statistics that suggested Higginbotham was pretty quiet in that game. Unfortunately I had a busy week and so didn’t publish my statistics until later in the week. Then when people compared my numbers to those they’d seen earlier there was a significant discrepancy.
I must admit when I saw the discussion on those statistics my first thought was that I might have made an embarrassing mistake. However, when I checked my analysis I was happy to stand by my numbers.
I’ll go into the exact numbers later but first, let me explain how I gather my statistics. I use software that allows you to tag a series of events attached to game footage. To tag the game I watch the footage and each time an event occurs I hit a designated key on the keyboard.
Once that task is completed I can extract data such as the number of events, the time each event occurred, et cetera. (The system also allows me to click on a particular event and see a short clip of that event.) When I dump this data into my template spreadsheet, I have some organised data to publish.
The process of tagging games might sound simple but after a number of years of practice I’ve got it down to a process that takes about five hours per match.
Now, back to the questions on Higginbotham’s numbers from that game in Durban. There are several people who publish rugby statistics but I’ll use those from Verusco and Sports Data for comparison purposes.
Verusco is a New Zealand based company that provides statistics for the NZRFU, the ARU and all the Super Rugby teams. A summary of its data can be found at http://www.ruggastats.com/default.aspx. I’ve seen the full statistics behind this data and there isn’t much they don’t measure. A contributor to this site, RuckinGoodStats, says that Verusco gives a little part of each game to a number of people who tag just that portion, and the portion data are then amalgamated. Each game takes around 40 hours to code.
Sports Data is an Australian company that lists Foxtel and most of the news organisations as its clients, although I’ve had it confirmed that Foxtel has a dedicated person keeping statistics live during games for the commentators’ use. A summary of the Sports Data statistics can be found at http://www.rugbystats.com.au/rugby.
The published statistics relating to Higginbotham are as follows:
|Breakdown Involvement - Attack||18||16|
|Breakdown Involvement - Defence||5||5|
|Breakdown Involvement - Total||23||2||21|
How can there be such differences? Many differences will be the result of interpretation as people will have different views on what constitutes an event. There are no absolutes here, just opinions.
First to the number of carries – I claim 3 whereas both of the other providers claim 2. In the video below I’ve started with the two carries that I think we probably agree on. Then I’ve included my third, which was when Higginbotham collected a pass from Genia after a turnover by Pocock, ran at the defence and then tried his one-handed offload which went forward. I consider a carry is where a player takes the ball to the line and commits a defender, regardless of whether he passes before or after contact. For the same reason, if a player runs 10 metres across field without committing a defender and then passes, I don’t count that as a carry but others might. On this basis I tagged that last clip as a carry for Higginbotham.
In relation to tackles, my numbers agree with Verusco’s – 9 attempts, 2 misses and 7 tackles made. However, Sports Data came up with only 3 attempts, 1 miss and 2 tackles made. In the video I’ve included all nine attempts. I’m not sure how Sports Data saw only three attempts. Some people may consider that tackle number 3 shouldn’t be included because there is an offload, and that tackles 4 and 7 should be considered as assisting tackles as Higginbotham wasn’t the first man in, but I don’t count assists; in my view the player is either involved in the tackle or not.
As to breakdown involvements my numbers are close to those of Verusco but I can’t provide any reconciliation to the numbers from Sports Data.
[youtube width=”600″ height=”450″]http://www.youtube.com/watch?v=LrztomnB8gk[/youtube]
Such discrepancies are a common issue when analysing games. As another example, I was recently doing some analysis of the Melbourne Rebels. I didn’t code every Super Rugby game in 2011 and don’t have my own tackle statistics for the full year so I looked at the Verusco and Sports Data stats. Verusco’s numbers record the Rebels as having missed 332 tackles in 2011 and Sports Data records their missed tackles at 538. The Verusco number would mean that the Rebels averaged 21 missed tackles per game; however, when I looked at the data for the six teams that made the Super Rugby finals, they show that those teams missed an average of 17 tackles per game in the regular season. Given the number of tries the Rebels conceded I struggle with the proposition that they only missed 4 tackles more per game than the top six sides in the competition.
I was also recently doing some work on an analysis of Quade Cooper’s defence and found that Verusco’s numbers record Quade as having missed 20 tackles in 2011 in Super Rugby, while Sports Data recorded his missed tackles at 38. I compared my numbers for the following games:
- in Round 2 of Super Rugby where I have Quade attempting 8 tackles, missing 3 and making 5, Verusco has Quade attempting 6 and making all of them, and Sports Data’s numbers agree with mine;
- in the Super Rugby final I have Quade attempting 8 tackles, missing 3 and making 5, Verusco has Quade attempting 5, missing 1 and making 4, and Sports Data has Quade attempting 6, missing 3 and making 3.
Another thing that may affect the accuracy of these statistics is that the same person is probably not coding each game for Verusco and Sports Data. Finally, we all make mistakes from time to time so there will be an element of that involved.
What’s the moral to this story? Statistics are useful but it’s a good idea to look at trends over a number of games rather than making judgements based on statistics from one game or one source.
As Albert Einstein said ‘Not everything that can be counted counts, and not everything that counts can be counted.’