This is a guest post by Ola Lidmark Eriksson.
Can machine learning make you rich from sports betting?
Two years ago, I asked myself if it would be possible to use machine learning to predict soccer games’ outcomes better.
I decided to give it a serious try, and today, two years and contextual data from 30,000 soccer games later, I’ve gained many interesting insights.
The Big Data Challenge: Let the Data Mining Begin
Step 1: To begin with, I harvested as many data points as possible. I mined old game data from every source and API I could find. Some of the more important ones were Football-data, Everysport, and Betfair.
Step 2: I merged these data points with their corresponding results, quantified them, and put everything into one database.
Step 3. Finally, I used the data to train a machine learning model, to be used as my software for predicting upcoming soccer games.
How To Measure Predictions of the Unpredictable
Of course, the nature of a soccer match is that it is unpredictable. I guess that’s why we love the game, right?
Still, I was somewhat obsessed with the naïve notion that I, armed with a data-driven machine-learning model, could predict games better than I usually would. At that point, I based most of my sports bets on emotions (“gut feelings”) rather than actual data.
The first challenge was to find out how to measure whether or not my model was succeeding. I quickly realized that measuring the actual percentage of correctly guessed games didn’t add much value — not without some form of context.
I decided to compare the model’s output with the best guesses of the actual market. The easiest way to assess such data was to harvest market-regulated odds. Therefore, I started comparing how my model would perform if Betfair, only because their odds are adjusted based on real people betting real money against each other.
The Results: Did My Model Make Me Rich?
Fast-forward to today: Now — two years have passed. Has the model made me a rich man?
I soon realized that my predictions, for the most part, were aligned with the market’s best performance.
Since I used a regression-based model, I could predict the strength of the probability of a specific game outcome. And at the most substantial probability grades, my model predicts roughly 70% of the games correctly. Since the market performs just as well, making serious money from my bets is difficult.
But, to be honest, I never thought I would create a “money machine,” either. Instead, I came to several rather exciting insights about the possibilities (and limitations!) of big data and machine learning:
Learning 1: Machine Learning and Diminishing Gains
In theory, machine learning should be able to improve over time. The amount of data the model has to learn from grows, enhancing the outcome of the predictions.
Well, this wasn’t my experience at all.
Two years ago, I started with about 2,000 games in my database and relatively limited data sets attached to them. Today, I have almost 30,000 games in the database, with metadata covering everything from weather and distances between the team’s home grounds to shots and corners.
All this added data — and the model has been able to “learn” over time!— it still didn’t improve its predictions. Big data and machine learning will only take you so far in predicting the unpredictable.
Learning 2: The Power of Unbiased Generalizations
The power of machine learning seems closely tied to its ability to make unbiased generalizations.
For example, I was curious to see if my model could predict when winning or losing streaks would be broken over the past two years. For instance, it could expect that Barcelona would finally lose after winning ten games straight. Could my model prove certain anomalies to be significant?
Well, it has shown to be not that good at that.
Instead, I found that the model was surprisingly good at betting against overvalued teams over time.
Last season, I saw how my soccer prediction machine often predicted against Borussia Dortmund while the market made another prediction. Dortmund had a lousy season making my model advantageous compared to market predictions. I have seen the same in teams like Liverpool and Chelsea this season.
So the lesson learned is that some people tend to make sports betting based on emotions. Liverpool and Dortmund are teams liked by lots of people, and at times, you make predictions with your heart instead of your brain. My machine learning model, well, it does not.
Learning 3: Machine Learning and Easy Gains
If nothing else, I learned that making predictions that outperform the market is complex. Still, when I started looking at what I had achieved (instead of just obsessing over what I hadn’t), I found one quite surprising fact:
From a simple Python program and less than 10,000 lines of code, I still had made something that performed just as well as the market. How many person-hours aren’t behind bookies’ odds models and predictions? The model can pick out attractive bets weekly, just as any newspaper or expert would. By making generalizations, you might not be able to find that one bet that will make you rich — but it may save you lots of time in the proper context.
Implementing Machine Learning to Wide Ideas
What I wanted to do was to look at the ideas companies gathered from their employees and try to predict whether they would implement the idea or not.
The team and I quantified the data, but instead of shots on goal and weather forecasts, we looked at how many had interacted with an idea — and in what way. And lo and behold, the outcome was on par with the soccer predictions:
We can now make decent predictions on whether or not we will implement a creative idea. We can visualize this to encourage more great ideas through gamification.
Can we find a good idea that doesn’t follow the general patterns of a good idea? No, not — not yet, at least.
Still, for the product, and given that you look at an organization that can harvest 10,000 ideas per year, finding ways to highlight and encourage particular ideas can save time and resources. So just going from 10,000 to 100 (perhaps) good ideas and visualizing the result saves lots of time.
The gap between making machines just as good as humans and making them better than we are.
Big data and machine learning might predict anything from early-stage cancer to making self-driving cars anticipate potential dangers. Models like this will probably prove most useful where generalizations save time.
Take medical implementations, for example. Sifting through thousands of birthmark pictures, a model could help pick the most likely ones to be cancer, thus saving doctors valuable time and resources.
However, human behaviour may prove to be tricky. In what way is human behaviour predictable? We’re rationally irrational. We can generalize, placing people into different categories based on what they like to eat, watch or do, but there might be too many factors that set us apart as individuals.
Will big data and machine learning detect the anomalies — or will it just be superb at generalizations?
I hope we’ll experience a future where companies focus on actual data analysis instead of thinking that “big data” by default equals “better data.”
So, until someone proves me wrong (or Arnold Schwarzenegger returns from the future, whichever comes first!), We should put machine learning to use where generalizations best can save time from real humans.
Otherwise, the risk is that we’ll end up with so many metrics that the sheer amount would suffocate any possibility of making sense of it.
About the writer: Ola Lidmark Eriksson is CTO at Wide Ideas.