Machine Learning: Predicting Soccer Games With Big Data

The Big Data Challenge: Let the Data Mining Begin

Step 1: To begin with, I harvested as many data points as possible. I mined old game data from every source and API I could find. Some of the more important ones were Football-data, Everysport, and Betfair.

Step 2: I merged these data points with their corresponding results, quantified them, and put everything into one database.

Step 3. Finally, I used the data to train a machine learning model, to be used as my software for predicting upcoming soccer games.

How To Measure Predictions of the Unpredictable

Of course, the nature of a soccer match is that it is unpredictable. I guess that’s why we love the game, right?

Still, I was somewhat obsessed with the naïve notion that I, armed with a data-driven machine-learning model, could predict games better than I usually would. At that point, I based most of my sports bets on emotions (“gut feelings”) rather than actual data.

The first challenge was to find out how to measure whether or not my model was succeeding. I quickly realized that measuring the actual percentage of correctly guessed games didn’t add much value — not without some form of context.

I decided to compare the model’s output with the best guesses of the actual market. The easiest way to assess such data was to harvest market-regulated odds. Therefore, I started comparing how my model would perform if Betfair, only because their odds are adjusted based on real people betting real money against each other.

The Results: Did My Model Make Me Rich?

Fast-forward to today: Now — two years have passed. Has the model made me a rich man?

Well, no.

I soon realized that my predictions, for the most part, were aligned with the market’s best performance.

Since I used a regression-based model, I could predict the strength of the probability of a specific game outcome. And at the most substantial probability grades, my model predicts roughly 70% of the games correctly. Since the market performs just as well, making serious money from my bets is difficult.

But, to be honest, I never thought I would create a “money machine,” either. Instead, I came to several rather exciting insights about the possibilities (and limitations!) of big data and machine learning:

Learning 1: Machine Learning and Diminishing Gains

In theory, machine learning should be able to improve over time. The amount of data the model has to learn from grows, enhancing the outcome of the predictions.

Well, this wasn’t my experience at all.

Two years ago, I started with about 2,000 games in my database and relatively limited data sets attached to them. Today, I have almost 30,000 games in the database, with metadata covering everything from weather and distances between the team’s home grounds to shots and corners.

All this added data — and the model has been able to “learn” over time!— it still didn’t improve its predictions. Big data and machine learning will only take you so far in predicting the unpredictable.

Learning 2: The Power of Unbiased Generalizations

The power of machine learning seems closely tied to its ability to make unbiased generalizations.

For example, I was curious to see if my model could predict when winning or losing streaks would be broken over the past two years. For instance, it could expect that Barcelona would finally lose after winning ten games straight. Could my model prove certain anomalies to be significant?

Well, it has shown to be not that good at that.

Instead, I found that the model was surprisingly good at betting against overvalued teams over time.

Last season, I saw how my soccer prediction machine often predicted against Borussia Dortmund while the market made another prediction. Dortmund had a lousy season making my model advantageous compared to market predictions. I have seen the same in teams like Liverpool and Chelsea this season.

So the lesson learned is that some people tend to make sports betting based on emotions. Liverpool and Dortmund are teams liked by lots of people, and at times, you make predictions with your heart instead of your brain. My machine learning model, well, it does not.

Learning 3: Machine Learning and Easy Gains

If nothing else, I learned that making predictions that outperform the market is complex. Still, when I started looking at what I had achieved (instead of just obsessing over what I hadn’t), I found one quite surprising fact:

From a simple Python program and less than 10,000 lines of code, I still had made something that performed just as well as the market. How many person-hours aren’t behind bookies’ odds models and predictions? The model can pick out attractive bets weekly, just as any newspaper or expert would. By making generalizations, you might not be able to find that one bet that will make you rich — but it may save you lots of time in the proper context.

Implementing Machine Learning to Wide Ideas

With these insights in mind, I started to look at another project I’ve been involved in for the last five years: Wide Ideas, a platform for companies to crowdsource ideas and creativity.

What I wanted to do was to look at the ideas companies gathered from their employees and try to predict whether they would implement the idea or not.

The team and I quantified the data, but instead of shots on goal and weather forecasts, we looked at how many had interacted with an idea — and in what way. And lo and behold, the outcome was on par with the soccer predictions:

We can now make decent predictions on whether or not we will implement a creative idea. We can visualize this to encourage more great ideas through gamification.

Can we find a good idea that doesn’t follow the general patterns of a good idea? No, not — not yet, at least.

Still, for the product, and given that you look at an organization that can harvest 10,000 ideas per year, finding ways to highlight and encourage particular ideas can save time and resources. So just going from 10,000 to 100 (perhaps) good ideas and visualizing the result saves lots of time.

The gap between making machines just as good as humans and making them better than we are.

Big data and machine learning might predict anything from early-stage cancer to making self-driving cars anticipate potential dangers. Models like this will probably prove most useful where generalizations save time.

Take medical implementations, for example. Sifting through thousands of birthmark pictures, a model could help pick the most likely ones to be cancer, thus saving doctors valuable time and resources.

However, human behaviour may prove to be tricky. In what way is human behaviour predictable? We’re rationally irrational. We can generalize, placing people into different categories based on what they like to eat, watch or do, but there might be too many factors that set us apart as individuals.

Will big data and machine learning detect the anomalies — or will it just be superb at generalizations?

I hope we’ll experience a future where companies focus on actual data analysis instead of thinking that “big data” by default equals “better data.”

So, until someone proves me wrong (or Arnold Schwarzenegger returns from the future, whichever comes first!), We should put machine learning to use where generalizations best can save time from real humans.

Otherwise, the risk is that we’ll end up with so many metrics that the sheer amount would suffocate any possibility of making sense of it.

About the writer: Ola Lidmark Eriksson is CTO at Wide Ideas.

Thanks for reading. Please support my blog by sharing articles with other communications and marketing professionals. You might also consider my PR services or speaking engagements.

How I Used Machine Learning to Predict Soccer Games for 24 Months Straight

For better or worse, machines do not place bets with their hearts.

Table of contents

The Big Data Challenge: Let the Data Mining Begin

How To Measure Predictions of the Unpredictable

The Results: Did My Model Make Me Rich?

Learning 1: Machine Learning and Diminishing Gains

Learning 2: The Power of Unbiased Generalizations

Learning 3: Machine Learning and Easy Gains

Implementing Machine Learning to Wide Ideas

PR Resource: More Guest Posts

More Guest Posts

The Cover Photo

Subscribe to SpinCTRL—it’s 100% free!

The Weekly Spin / No. 3

The Weekly Spin / No. 2

The Weekly Spin / No. 1

Doctor Spin’s List of PR Blogs (2024)

PR Commentary on Current Events, No Thanks

Social Media Fakers — Oh, They Seem So Perfect Online

Does Spin Suck?

“Alternative Facts” Will Be Kellyanne Conway’s PR Legacy

FOMO & Digital Trends: You Don’t Have To Catch Them All

Public Relations in the Metaverse

Twin Peaks Season 3 Finale Explained (Spoiler Alert)

43 Strange Swedish Idioms in English

The Publics in Public Relations

58 Logical Fallacies and Cognitive Biases

10 Storytelling Elements (Found in Almost All Great Stories)

How I Used Machine Learning to Predict Soccer Games for 24 Months Straight

For better or worse, machines do not place bets with their hearts.

Table of contents

The Big Data Challenge: Let the Data Mining Begin

How To Measure Predictions of the Unpredictable

The Results: Did My Model Make Me Rich?

Learning 1: Machine Learning and Diminishing Gains

Learning 2: The Power of Unbiased Generalizations

Learning 3: Machine Learning and Easy Gains

Implementing Machine Learning to Wide Ideas

PR Resource: More Guest Posts

More Guest Posts

The Cover Photo

Newsletter for PR lovers!

Subscribe to SpinCTRL—it’s 100% free!