This is a guest post by Ola Lidmark Eriksson.
Can machine learnยญing make you rich from sports betting?
Two years ago, I asked myself if it would be posยญsible to use machine learnยญing to preยญdict socยญcer gamesโ outยญcomes better.
I decided to give it a serยญiยญous try, and today, two years and conยญtexยญtuยญal data from 30,000 socยญcer games later, Iโve gained many interยญestยญing insights.
Here we go:
The Big Data Challenge: Let the Data Mining Begin
Step 1: To begin with, I harยญvesยญted as many data points as posยญsible. I mined old game data from every source and API I could find. Some of the more importยญant ones were Football-data, Everysport, and Betfair.
Step 2: I merged these data points with their corยญresยญpondยญing resยญults, quanยญtiยญfied them, and put everything into one database.
Step 3. Finally, I used the data to train a machine learnยญing modยญel, to be used as my softยญware for preยญdictยญing upcomยญing socยญcer games.
How To Measure Predictions of the Unpredictable
Of course, the nature of a socยญcer match is that it is unpreยญdictยญable. I guess thatโs why we love the game, right?
Still, I was someยญwhat obsessed with the naรฏve notion that I, armed with a data-drivยญen machine-learnยญing modยญel, could preยญdict games betยญter than I usuยญally would. At that point, I based most of my sports bets on emoยญtions (โgut feelยญingsโ) rather than actuยญal data.
The first chalยญlenge was to find out how to measยญure whethยญer or not my modยญel was sucยญceedยญing. I quickly realยญized that measยญurยญing the actuยญal perยญcentยญage of corยญrectly guessed games didnโt add much valueโโโnot without some form of context.
I decided to comยญpare the modelโs outยญput with the best guesses of the actuยญal marยญket. The easiยญest way to assess such data was to harยญvest marยญket-regยญuยญlated odds. Therefore, I starยญted comยญparยญing how my modยญel would perยญform if Betfair, only because their odds are adjusยญted based on real people betยญting real money against each other.
The Results: Did My Model Make Me Rich?
Fast-forยญward to today: Nowโโโtwo years have passed. Has the modยญel made me a rich man?
Well, no.
I soon realยญized that my preยญdicยญtions, for the most part, were aligned with the marketโs best performance.
Since I used a regresยญsion-based modยญel, I could preยญdict the strength of the probยญabยญilยญity of a speยญcifยญic game outยญcome. And at the most subยญstanยญtial probยญabยญilยญity grades, my modยญel preยญdicts roughly 70% of the games corยญrectly. Since the marยญket perยญforms just as well, makยญing serยญiยญous money from my bets is difficult.
But, to be honยญest, I nevยญer thought I would creยญate a โmoney machine,โ either. Instead, I came to sevยญerยญal rather excitยญing insights about the posยญsibยญilยญitยญies (and limยญitยญaยญtions!) of big data and machine learning:
Learning 1: Machine Learning and Diminishing Gains
In theยญory, machine learnยญing should be able to improve over time. The amount of data the modยญel has to learn from grows, enhanยญcing the outยญcome of the predictions.
Well, this wasnโt my experยญiยญence at all.
Two years ago, I starยญted with about 2,000 games in my dataยญbase and relยญatยญively limยญited data sets attached to them. Today, I have almost 30,000 games in the dataยญbase, with metadata covยญerยญing everything from weathยญer and disยญtances between the teamโs home grounds to shots and corners.
All this added dataโโโand the modยญel has been able to โlearnโ over time!โ it still didnโt improve its preยญdicยญtions. Big data and machine learnยญing will only take you so far in preยญdictยญing the unpredictable.
Learning 2: The Power of Unbiased Generalizations
The power of machine learnยญing seems closely tied to its abilยญity to make unbiased genยญerยญalยญizยญaยญtions.
For example, I was curiยญous to see if my modยญel could preยญdict when winยญning or losยญing streaks would be broken over the past two years. For instance, it could expect that Barcelona would finally lose after winยญning ten games straight. Could my modยญel prove cerยญtain anomยญalies to be significant?
Well, it has shown to be not that good at that.
Instead, I found that the modยญel was surยญprisยญingly good at betยญting against overยญvalยญued teams over time.
Last seaยญson, I saw how my socยญcer preยญdicยญtion machine often preยญdicted against Borussia Dortmund while the marยญket made anothยญer preยญdicยญtion. Dortmund had a lousy seaยญson makยญing my modยญel advantยญageยญous comยญpared to marยญket preยญdicยญtions. I have seen the same in teams like Liverpool and Chelsea this season.
So the lesยญson learned is that some people tend to make sports betยญting based on emoยญtions. Liverpool and Dortmund are teams liked by lots of people, and at times, you make preยญdicยญtions with your heart instead of your brain. My machine learnยญing modยญel, well, it does not.
Learning 3: Machine Learning and Easy Gains
If nothยญing else, I learned that makยญing preยญdicยญtions that outยญperยญform the marยญket is comยญplex. Still, when I starยญted lookยญing at what I had achieved (instead of just obsessยญing over what I hadnโt), I found one quite surยญprisยญing fact:
From a simple Python proยญgram and less than 10,000 lines of code, I still had made someยญthing that perยญformed just as well as the marยญket. How many perยญson-hours arenโt behind bookยญiesโ odds modยญels and preยญdicยญtions? The modยญel can pick out attractยญive bets weekly, just as any newsยญpaยญper or expert would. By makยญing genยญerยญalยญizยญaยญtions, you might not be able to find that one bet that will make you richโโโbut it may save you lots of time in the propยญer context.
Implementing Machine Learning to Wide Ideas
With these insights in mind, I starยญted to look at anothยญer proยญject Iโve been involved in for the last five years: Wide Ideas, a platยญform for comยญpanยญies to crowdยญsource ideas and creativity.
What I wanted to do was to look at the ideas comยญpanยญies gathered from their employยญees and try to preยญdict whethยญer they would impleยญment the idea or not.
The team and I quanยญtiยญfied the data, but instead of shots on goal and weathยญer foreยญcasts, we looked at how many had interยญacยญted with an ideaโโโand in what way. And lo and behold, the outยญcome was on par with the socยญcer predictions:
We can now make decent preยญdicยญtions on whethยญer or not we will impleยญment a creยญatยญive idea. We can visuยญalยญize this to encourยญage more great ideas through gamification.
Can we find a good idea that doesnโt folยญlow the genยญerยญal patยญterns of a good idea? No, notโโโnot yet, at least.
Still, for the product, and givยญen that you look at an organยญizยญaยญtion that can harยญvest 10,000 ideas per year, findยญing ways to highยญlight and encourยญage parยญticยญuยญlar ideas can save time and resources. So just going from 10,000 to 100 (perยญhaps) good ideas and visuยญalยญizยญing the resยญult saves lots of time.
The gap between makยญing machines just as good as humans and makยญing them betยญter than we are.
Big data and machine learnยญing might preยญdict anyยญthing from early-stage canยญcer to makยญing self-drivยญing cars antiยญcipยญate potenยญtial dangers. Models like this will probยญably prove most useยญful where genยญerยญalยญizยญaยญtions save time.
Take medยญicยญal impleยญmentยญaยญtions, for example. Sifting through thouยญsands of birthยญmark picยญtures, a modยญel could help pick the most likely ones to be canยญcer, thus savยญing docยญtors valuยญable time and resources.
However, human behaยญviour may prove to be tricky. In what way is human behaยญviour preยญdictยญable? Weโre rationยญally irraยญtionยญal. We can genยญerยญalยญize, plaยญcing people into difยญferยญent catยญegorยญies based on what they like to eat, watch or do, but there might be too many factors that set us apart as individuals.
Will big data and machine learnยญing detect the anomยญaliesโโโor will it just be superb at generalizations?
I hope weโll experยญiยญence a future where comยญpanยญies focus on actuยญal data anaยญlysยญis instead of thinkยญing that โbig dataโ by default equals โbetยญter data.โ
So, until someone proves me wrong (or Arnold Schwarzenegger returns from the future, whichever comes first!), We should put machine learnยญing to use where genยญerยญalยญizยญaยญtions best can save time from real humans.
Otherwise, the risk is that weโll end up with so many metยญrics that the sheer amount would sufยญfocยญate any posยญsibยญilยญity of makยญing sense of it.
About the writer: Ola Lidmark Eriksson is CTO at Wide Ideas.
Thank you. Please supยญport my blog by sharยญing artยญicles with othยญer comยญmuยญnicยญaยญtions- and marยญketยญing proยญfesยญsionยญals. Please also conยญsider my PR serยญvices or speakยญing engageยญments.
PR Resource: More Guest Posts
More Guest Posts
Learn more: All Guest Posts
๐ก Subscribe and get a free ebook on how to get betยญter PR.