The PR BlogPR TrendsGuest PostsHow I Used Machine Learning to Predict Soccer Games for 24 Months...

How I Used Machine Learning to Predict Soccer Games for 24 Months Straight

For better or worse, machines do not place bets with their hearts.

Cover photo: @jerrysilfwer

This is a guest post by Ola Lidmark Eriksson.

Can machine learn­ing make you rich from sports betting?

Two years ago, I asked myself if it would be pos­sible to use machine learn­ing to pre­dict soc­cer games’ out­comes better.

I decided to give it a ser­i­ous try, and today, two years and con­tex­tu­al data from 30,000 soc­cer games later, I’ve gained many inter­est­ing insights.

The Big Data Challenge: Let the Data Mining Begin

Step 1: To begin with, I har­ves­ted as many data points as pos­sible. I mined old game data from every source and API I could find. Some of the more import­ant ones were Football-data, Everysport, and Betfair.

Step 2: I merged these data points with their cor­res­pond­ing res­ults, quan­ti­fied them, and put everything into one database. 

Step 3. Finally, I used the data to train a machine learn­ing mod­el, to be used as my soft­ware for pre­dict­ing upcom­ing soc­cer games.

How To Measure Predictions of the Unpredictable

Of course, the nature of a soc­cer match is that it is unpre­dict­able. I guess that’s why we love the game, right?

Still, I was some­what obsessed with the naïve notion that I, armed with a data-driv­en machine-learn­ing mod­el, could pre­dict games bet­ter than I usu­ally would. At that point, I based most of my sports bets on emo­tions (“gut feel­ings”) rather than actu­al data.

The first chal­lenge was to find out how to meas­ure wheth­er or not my mod­el was suc­ceed­ing. I quickly real­ized that meas­ur­ing the actu­al per­cent­age of cor­rectly guessed games didn’t add much value — not without some form of context.

I decided to com­pare the model’s out­put with the best guesses of the actu­al mar­ket. The easi­est way to assess such data was to har­vest mar­ket-reg­u­lated odds. Therefore, I star­ted com­par­ing how my mod­el would per­form if Betfair, only because their odds are adjus­ted based on real people bet­ting real money against each other.

The Results: Did My Model Make Me Rich?

Fast-for­ward to today: Now — two years have passed. Has the mod­el made me a rich man?

Well, no.

I soon real­ized that my pre­dic­tions, for the most part, were aligned with the market’s best performance.

Since I used a regres­sion-based mod­el, I could pre­dict the strength of the prob­ab­il­ity of a spe­cif­ic game out­come. And at the most sub­stan­tial prob­ab­il­ity grades, my mod­el pre­dicts roughly 70% of the games cor­rectly. Since the mar­ket per­forms just as well, mak­ing ser­i­ous money from my bets is difficult.

But, to be hon­est, I nev­er thought I would cre­ate a “money machine,” either. Instead, I came to sev­er­al rather excit­ing insights about the pos­sib­il­it­ies (and lim­it­a­tions!) of big data and machine learning:

Learning 1: Machine Learning and Diminishing Gains

In the­ory, machine learn­ing should be able to improve over time. The amount of data the mod­el has to learn from grows, enhan­cing the out­come of the predictions.

Well, this wasn’t my exper­i­ence at all.

Two years ago, I star­ted with about 2,000 games in my data­base and rel­at­ively lim­ited data sets attached to them. Today, I have almost 30,000 games in the data­base, with metadata cov­er­ing everything from weath­er and dis­tances between the team’s home grounds to shots and corners.

All this added data — and the mod­el has been able to “learn” over time!— it still didn’t improve its pre­dic­tions. Big data and machine learn­ing will only take you so far in pre­dict­ing the unpredictable.

Learning 2: The Power of Unbiased Generalizations

The power of machine learn­ing seems closely tied to its abil­ity to make unbiased gen­er­al­iz­a­tions.

For example, I was curi­ous to see if my mod­el could pre­dict when win­ning or los­ing streaks would be broken over the past two years. For instance, it could expect that Barcelona would finally lose after win­ning ten games straight. Could my mod­el prove cer­tain anom­alies to be significant?

Well, it has shown to be not that good at that.

Instead, I found that the mod­el was sur­pris­ingly good at bet­ting against over­val­ued teams over time.

Last sea­son, I saw how my soc­cer pre­dic­tion machine often pre­dicted against Borussia Dortmund while the mar­ket made anoth­er pre­dic­tion. Dortmund had a lousy sea­son mak­ing my mod­el advant­age­ous com­pared to mar­ket pre­dic­tions. I have seen the same in teams like Liverpool and Chelsea this season.

So the les­son learned is that some people tend to make sports bet­ting based on emo­tions. Liverpool and Dortmund are teams liked by lots of people, and at times, you make pre­dic­tions with your heart instead of your brain. My machine learn­ing mod­el, well, it does not.

Learning 3: Machine Learning and Easy Gains

If noth­ing else, I learned that mak­ing pre­dic­tions that out­per­form the mar­ket is com­plex. Still, when I star­ted look­ing at what I had achieved (instead of just obsess­ing over what I hadn’t), I found one quite sur­pris­ing fact:

From a simple Python pro­gram and less than 10,000 lines of code, I still had made some­thing that per­formed just as well as the mar­ket. How many per­son-hours aren’t behind book­ies’ odds mod­els and pre­dic­tions? The mod­el can pick out attract­ive bets weekly, just as any news­pa­per or expert would. By mak­ing gen­er­al­iz­a­tions, you might not be able to find that one bet that will make you rich — but it may save you lots of time in the prop­er context.

Implementing Machine Learning to Wide Ideas

With these insights in mind, I star­ted to look at anoth­er pro­ject I’ve been involved in for the last five years: Wide Ideas, a plat­form for com­pan­ies to crowd­source ideas and cre­ativ­ity.

What I wanted to do was to look at the ideas com­pan­ies gathered from their employ­ees and try to pre­dict wheth­er they would imple­ment the idea or not.

The team and I quan­ti­fied the data, but instead of shots on goal and weath­er fore­casts, we looked at how many had inter­ac­ted with an idea — and in what way. And lo and behold, the out­come was on par with the soc­cer predictions:

We can now make decent pre­dic­tions on wheth­er or not we will imple­ment a cre­at­ive idea. We can visu­al­ize this to encour­age more great ideas through gamification.

Can we find a good idea that doesn’t fol­low the gen­er­al pat­terns of a good idea? No, not — not yet, at least.

Still, for the product, and giv­en that you look at an organ­iz­a­tion that can har­vest 10,000 ideas per year, find­ing ways to high­light and encour­age par­tic­u­lar ideas can save time and resources. So just going from 10,000 to 100 (per­haps) good ideas and visu­al­iz­ing the res­ult saves lots of time.

The gap between mak­ing machines just as good as humans and mak­ing them bet­ter than we are.

Big data and machine learn­ing might pre­dict any­thing from early-stage can­cer to mak­ing self-driv­ing cars anti­cip­ate poten­tial dangers. Models like this will prob­ably prove most use­ful where gen­er­al­iz­a­tions save time.

Take med­ic­al imple­ment­a­tions, for example. Sifting through thou­sands of birth­mark pic­tures, a mod­el could help pick the most likely ones to be can­cer, thus sav­ing doc­tors valu­able time and resources.

However, human beha­viour may prove to be tricky. In what way is human beha­viour pre­dict­able? We’re ration­ally irra­tion­al. We can gen­er­al­ize, pla­cing people into dif­fer­ent cat­egor­ies based on what they like to eat, watch or do, but there might be too many factors that set us apart as individuals.

Will big data and machine learn­ing detect the anom­alies — or will it just be superb at generalizations?

I hope we’ll exper­i­ence a future where com­pan­ies focus on actu­al data ana­lys­is instead of think­ing that “big data” by default equals “bet­ter data.”

So, until someone proves me wrong (or Arnold Schwarzenegger returns from the future, whichever comes first!), We should put machine learn­ing to use where gen­er­al­iz­a­tions best can save time from real humans. 

Otherwise, the risk is that we’ll end up with so many met­rics that the sheer amount would suf­foc­ate any pos­sib­il­ity of mak­ing sense of it.

About the writer: Ola Lidmark Eriksson is CTO at Wide Ideas.

Please sup­port my blog by shar­ing it with oth­er PR- and com­mu­nic­a­tion pro­fes­sion­als. For ques­tions or PR sup­port, con­tact me via jerry@​spinfactory.​com.

Jerry Silfwer
Jerry Silfwer
Jerry Silfwer, alias Doctor Spin, is an awarded senior adviser specialising in public relations and digital strategy. Currently CEO at KIX Index and Spin Factory. Before that, he worked at Kaufmann, Whispr Group, Springtime PR, and Spotlight PR. Based in Stockholm, Sweden.

The Cover Photo


Grab a free subscription before you go.

Get notified of new blog posts & new PR courses

🔒 Please read my integrity- and cookie policy.

From influencers to business leaders: Discover how influencers start proprietary brands and how these new ventures go on to challenge traditional businesses.
Most popular