Recently, I have been delving into a few new statistical modelling techniques with potential applications in betting.
One such technique which can – in theory – be used to pinpoint value in win and place markets essentially considers 1) how likely it is that each runner will run to a certain rating (be it an RPR, a speed rating, or a rating of your own creation) before 2) simulating a huge number of renewals based on that information and then 3) calculating “fair odds” based on the proportion of times that each horse won or placed in all simulated renewals.
Like all analyses, the validity of this technique and its likelihood of success for punters depends on the accuracy and quality of the data being pumped into the model. From a theoretical perspective, however, the logic behind it is very sound – this is because a) the data takes into account the ability (individual ratings) and consistency (variation in ratings) of each runner, and b) the huge number of simulations provide very robust predictions on the likelihood of each runner producing different levels of performance.
To showcase the technique I have taken a look at the Queen Anne Stakes at Royal Ascot. I chose this race because all the runners are (relatively) exposed at this Group One level, and thus have significant back-histories of performances run in Group/Listed company that provide a lot of RPR data. I don’t personally use RPRs often in my betting, but they are generally perceived to be fairly accurate in assessing the merit of performances and they are easily accessible for this sort of analysis.
Data and Method
- I recorded the RPR average (mean) and standard deviation (a measure of the variation around the mean) of all horses, including all RPRs returned in Group/Graded/Listed races run on turf in 2018 and 2019. I did not include RPRs older than that as I reasoned recent form is a more accurate guide to a horse’s current ability.
- I then computed 100,000 potential RPRs for the Queen Anne for each horse. Using the analysis software, R, I provided the computer with the means and standard deviation in RPR for each runner, and then asked the software to compute 100,000 random RPRs by using a “truncated” normal distribution algorithm*.
*This sounds complex, but essentially I’m simply randomly drawing RPRs based on the belief that across a horse’s lifetime it will produce a normal distribution of RPRs within its ability limits (very few terrible or incredible performances, and lots of average/fair ones). The “truncated” part comes from the fact that a horse with a mean RPR of say 110 may run one stinker (let’s say an RPR of 80 if it eased when beaten and finishes miles behind the principals) but that same horse cannot reasonably be expected to run a sparkling 140 as its ability simply won’t allow it to do that.
For consistency, I assumed that every horse could return an RPR of 0 (if it loses its action and comes home tailed off, or pulls up with an injury) but set the upper limit of possibility as being 7lb higher than the highest RPR returned from 2018 onwards.
You may be able to understand this better by viewing the figure below. The blue bars and curve show the proportion of simulations in which Mustashry achieved a certain RPR. This is of course based on his mean (average) RPR and the standard deviation (variation from that average) in his recent runs. For comparison purposes, I have also fitted the curve for One Master. You will note that this curve shifts to the left (towards lesser RPRs) and isn’t quite as tall (because the variation in likely RPR are greater — she is a little less consistent in her RPR than Mustashry).
In an A vs B scenario, you can see that Mustashry would beat One Master (return a higher RPR) more often than the other way round — although of course that does happen in a number of the simulations. The model effectively compares these likely RPRs for all runners for each simulated renewal (picture the figure below, but with 16 curves laying over one another).
- Having produced 100,000 RPRs for each horse, I ask the computer to treat each row (simulation) as an individual renewal, rank the horses from highest RPR to lowest, and then calculate the proportion of simulated renewals that each horse won (highest RPR) and placed in (highest three RPRs).
- Once those data have been computed, it is easy to use these proportions to produce “fair” win and place odds for each horse.
According to my model, if (recent) past RPRs are accurate, then Mustashry should be outright favourite for the Queen Anne at around 5.9 (roughly 5-1) —see the Table below.
|Horse||Fair Win Odds||Fair Place Odds|
|Beat The Bank||10.4||5.3|
There are a number of additional points of note. These include:
- Stormy Antarctic seems overpriced with the bookmakers as my model suggests he is around a 13-1 chance to win, and as short as 6-4 to hit the frame; this will shock many I’m sure with Stormy Atlantic viewed as a major outsider, but he is one of the most consistent horses in the race in terms of regularly achieving RPRs that put him in the mix and this looks a sub-standard Group One
- Le Brivido seems greatly underpriced with the bookmakers given he is over 125-1 with my model to win and over 16-1 to hit the frame — however, he is one whose assessment should be treated with great caution (see caveats and limitations below)
- Hazapour also seems greatly underpriced with the bookmakers given he is over 150-1 with my model and over 16-1 to hit the frame –however, he is another whose assessment should be treated with great caution (see caveats and limitations below)
- And, more generally, a horse’s fair win odds correlate very closely with its fair place odds, but there is the odd discrepancy – this underlines that a horse (e.g. Beat The Bank) can have a higher chance of winning than another (e.g. Dream Castle), but a lower chance of hitting the places. This is particularly so when horses vary widely in the amount by which their RPRs differ from race to race (Beat The Bank is capable of relatively high RPRs, but he is less consistent in the level of form he produces than many of his rivals)
Caveats and limitations
As mentioned before, a model is only as good as the data that is fed into it. While RPRs are a relatively solid way of assessing a horse’s ability based on past performances, there are examples when you can be more confident in its use than others.
For example, as noted above, both Le Brivido and Hazapour are huge prices with my model in comparison to those that are being offered on them by the bookmakers. Le Brivido hasn’t done much racing over the past 18 months, so I have had little RPR data to work with, and that which I have had available may well underestimate this one’s ability – especially as Aidan O’Brien has said he expects this former Royal winner to come on a lot for his recent spins.
Hazapour, meanwhile, is extremely unexposed (he has more scope for progression than anything in the race) and won with a good bit up his sleeve last time. He is also unexposed over 1m, so there is every chance he has the ability to run a higher RPR than we’ve seen before.
Besides those two, Barney Roy was the other tricky one to model, as he had very good older form that would likely have made him favourite, but we only have two recent runs to go by for RPR purposes. While he is not discounted by any means by my model on those two runs, there is also every chance that he is still capable of running to a higher level than he has shown in 2019.
Additionally, the model is not sufficiently sophisticated to weight the likelihood of running certain RPRs based on the RPRs returned by other runners. Essentially, it treats each horse as being independent of the others (so it assumes how well one horse runs has no impact on how well another one does. In reality, we know this isn’t likely to be true, because a race packed full of front-runners, for example, is likely to see a lot of those return lower RPRs than if the race was more evenly balanced in terms of the running style of the horses in it.
And lastly, to obtain sufficient RPR data I had to include RPRs for each horse run in Graded races on turf regardless of the distance or going. If the rains arrive and the going turns soft, it would be unreasonable to expect all horses to act as well as each other on it. With any luck, we won’t be dealing with extremes of ground this week.
I will be using this technique to produce similar analyses for a few other races at Royal Ascot, and intend to use it to produce “fair odds” for Hong Kong racing beginning next season. HK fare is appealing for this type of analysis as there is a huge amount of ultra-reliable data available to work with. I plan to produce my own speed ratings for all horses running in HK and use these instead of RPRs to model “fair odds” ahead of many of the meetings.
Watch this space.