Doctor SpinTrendsTechnology TrendsHow I Used Machine Learning to Predict Soccer Games for 24 Months...

How I Used Machine Learning to Predict Soccer Games for 24 Months Straight

For better or worse, machines do not place bets with their hearts.

This is a guest post by Ola Lidmark Eriksson, CTO at Wide Ideas.

Can machine learning make you rich from sports betting?

Two years ago, I asked myself if it would be possible to use machine learning to predict soccer games’ outcomes better.

I decided to give it a serious try, and today, two years and contextual data from 30,000 soccer games later, I’ve gained lots of interesting insights.

Table of Contents

    The Big Data Challenge: Let the Data Mining Begin

    Step 1: To begin with, I harvested as many data points as possible. I mined old game data from every source and API I could find. Some of the more important ones were Football-data, Everysport, and Betfair.

    Step 2: I then merged these data points with their corresponding results, quantified them, and put everything into one database.

    Step 3. Finally, I used the data to train a machine learning model, to be used as my software for predicting upcoming soccer games.

    How To Measure Predictions of the Unpredictable

    Of course, the nature of a soccer match is that it is unpredictable. I guess that’s why we love the game, right?

    Still, I was somewhat obsessed with the naive notion that I, armed with a data-driven machine learning model, would be able to predict games better than I usually would. At that point, I based most of my sports bets on emotions (“gut feelings”) rather than actual data.

    The first challenge was to find out how to measure whether or not my model was succeeding. I quickly realized that measuring the actual percentage of correctly guessed games didn’t add much value — not without some form of context.

    I decided to compare the model’s output with the best guesses of the actual market. The easiest way to assess such data was to harvest market regulated odds. Therefore, I started comparing how my model would perform if Betfair, only because their odds are adjusted based on real people betting real money against each other.

    The Results: Did My Model Make Me Rich?

    Fast-forward to today: Now—two years have passed. Has the model made me a rich man?

    Well, no.

    I soon realized that my predictions, for the most part, were aligned with the market’s best performance.

    Since I used a regression-based model, I was able to predict the strength of the probability of a specific outcome of a game. And at the most substantial grades of probability, my model predicts roughly 70% of the games correctly. Since the market performs just as well, it makes it difficult for me to make any serious money from my bets.

    But, to be honest, I never actually thought that I would create a “money machine”, either. Instead, I came to several rather exciting insights about the possibilities (and limitations!) of big data and machine learning:

    Learning 1: Machine Learning and Diminishing Gains

    In theory, machine learning should be able to improve over time. The amount of data the model has to learn from grows, enhancing the outcome of the predictions.

    Well, this wasn’t my experience at all.

    Two years ago, I started with about 2,000 games in my database and relatively limited data sets attached to them. Today, I have almost 30,000 games in the database, complete with lots of metadata covering everything from weather and distances between the team’s home grounds to shots and corners.

    All this added data—and the model has been able to “learn” over time!—it still didn’t improve its predictions. It seems big data and machine learning only take you so far in predicting the unpredictable.

    Learning 2: The Power of Unbiased Generalizations

    The power of machine learning seems closely tied to its ability to make unbiased generalizations.

    For example, I was curious to see if my model could predict when winning or losing streaks were to be broken over the past two years. For instance, it could expect that Barcelona would finally lose after winning ten games straight. Could my model prove certain anomalies to be significant?

    Well, it has shown to be not that good at that.

    Instead, I found that the model was surprisingly good at betting against overvalued teams over time.

    Last season, I saw how my soccer prediction machine often predicted against Borussia Dortmund while the market made another prediction. Dortmund had a lousy season making my model advantageous compared to market predictions. This season I have seen the same when it comes to teams like Liverpool and Chelsea.

    So the lesson learned is that some people tend to make sports betting decisions based on emotions. Liverpool and Dortmund are teams liked by lots of people, and at times, you make predictions with your heart instead of your brain. My machine learning model, well, it does not.

    Learning 3: Machine Learning and Easy Gains

    If nothing else, I learned that making predictions that outperforms the market is complex. Still, when I started looking at what I had achieved (instead of just obsessing over what I hadn’t), I found one quite surprising fact:

    From a simple Python program and less than 10,000 lines of code, I still had made something that performed just as well as the market. How many person-hours aren’t behind bookies odds models and predictions? The model can pick out attractive bets weekly, just as any newspaper or expert would. By making generalizations, you might not be able to find that one bet that will make you rich—but it may save you lots of time in the proper context.

    Implementing Machine Learning to Wide Ideas

    With these insights in mind, I started to look at another project I’ve been involved in for the last five years: Wide Ideas, a platform for companies to crowdsource ideas and creativity.

    What I wanted to do was to look at the ideas companies gather from their employees and try to predict whether they would implement the idea or not.

    The team and I quantified the data, but instead of shots on goal and weather forecasts, we looked at how many had interacted with an idea—and in what way. And lo and behold; the outcome was on par with the soccer predictions:

    We’re now able to make decent predictions on whether or not we will implement a creative idea or not. We can visualize this to encourage more great ideas through gamification.

    Can we find a good idea that doesn’t follow the general patterns of a good idea? No, not—not yet, at least.

    Still, for the product, and given that you look at an organization that can harvest 10,000 ideas per year, finding ways to highlight and encourage particular ideas can save time and resources. So just by going from 10,000 ideas to 100 probably good ideas and visualizing the result saves lots of time.

    The gap between making machines just as good as humans and making them better than we are.

    Big data and machine learning might predict anything from early-stage cancer to making self-driving cars anticipate potential dangers. Models like this will probably prove most useful where generalizations save time.

    Take medical implementations, for example. Sifting through thousands of birthmarks pictures, a model could help pick the most likely ones to be cancer, thus saving doctors valuable time and resources.

    However, human behaviour may prove to be tricky. In what way is human behaviour predictable? We’re rationally irrational. We will be able to generalize, placing people into different categories based on what you like to eat, watch or do, but there might be too many factors that set us apart as individuals.

    Will big data and machine learning detect the anomalies—or will it just be superb at making generalizations?

    I hope that we’ll experience a future where companies are focusing on actual data analysis instead of thinking that “big data” by default equals “better data.”

    So, until someone proves me wrong (or Arnold Schwarzenegger returns from the future, whichever comes first!), We should put machine learning to use where generalizations best can save time from real humans.

    Otherwise, the risk is that we’ll end up with so many metrics that the sheer amount would suffocate any possibility of making sense out of it.

    Cover photo by Jerry Silfwer (Prints/Instagram)


    Jerry Silfwer
    Jerry Silfwer
    Jerry Silfwer, aka Doctor Spin, is an awarded senior adviser specialising in public relations and digital strategy. Currently CEO at KIX Index and Spin Factory. Before that, he worked at Kaufmann, Whispr Group, Springtime PR, and Spotlight PR. Based in Stockholm, Sweden.
    Buy PR Merch

    Grab a free subscription before you go.

    Get notified of new blog posts & new PR courses

    🔒 Please read my integrity- and cookie policy.

    The Borg Complex refers to a specific form of technological determinism, but in my case, it's also a psychological fallacy.
    Most popular