Regression Analysis

Do you know the type of people who run sports books these days? Is the first image that comes to your mind of someone who looks like Robert DeNiro smashing a guys hand with a hammer? These days guys like Joseph Asher, who graduated magna cum laude with a law degree, and Vic Salerno who holds a DDS from Marquette University, are the ones in charge. The average gambler doesn’t think about it but the sports betting market has changed by leaps and bounds in the last ten years. Math wizards use advanced statistical analysis to set lines nationwide. Former Wall Street quantitative analysts are employed in many research positions with gaming companies. They’re using the same techniques from the financial, economic and medical sectors to model sporting events. One of the most efficient tools at their disposal is linear regression analysis and that’s the same reason it forms the base of our handicapping formula.

Linear regression is a statistical tool for measuring the relationships among variables. Specifically, the relationship of a dependent variable to one or more independent variables. It has been used for a long time in fields like economics and medicine for its prediction and forecasting capabilities. In sports handicapping we use linear regression analysis to determine key factors (independent variables) that influence game outcomes and trends (dependent variable). We can then take our knowledge of what effects game outcomes and build a model for game prediction. Sporting events are very complex with many variables so we use a multivariate linear model.

The way we do this is to take past data and measure the change in the dependent variable when one of the independent variables is altered and all other independent variables are held constant. Through this process we can begin to determine the independent variables. Next we can begin to assign different weights to the independent variables that we have identified. As we know, many factors play a part when determining who wins the game, but not all factors are equally important. In the sports handicapping world we must figure out which are the important ones, and exactly how important they are. The general formula for a linear regression equation is:

Yi = B0 + B1xi + B2x2i +B3x3i … I = 1…, n

This is a multiple linear regression model where there are several independent variables of functions of independent variables. Once we have all the independent variables identified and correctly weighted our next step is to begin to determine probabilities. This requires us to use a different technique in our tool bag called logistic regression.

In statistics logistic regression is used to assign a probability to an event by plugging data into a logistic equation. Once we have a probability of an event occurring we can go on to further steps. The equation for logistic regression looks very similar to the one above:

Z = B0 + B1x1 + B2x2 + B3x3 + … Bnxn

B0 in this equation is referred to as the intercept. The B variables on the right side of the equation are referred to as the regression coefficients of the X values. The intercept is the value of Z when all the values of all the independent variables is 0, so what does this mean for us with our sports handicapping models? When all the independent values are “zero” is when our logistic model would return a value of .5, or 50%. This would represent that the team we analyzed had exactly a 50% chance of winning the game. Consequently, if we analyzed the team they were playing they would also have a 50% chance of winning, a true pick’em. This gives us a starting point in the equation before we enter our values. This is the same technique used by the health professionals of the world. Through this type of analysis they have determined the negative effects of smoking being linked to cancer, and high cholesterol being linked to heart disease. When used properly regression analysis can lead to very accurate and powerful conclusions. The trick is to be able to accurately identify the independent variables and weight them correctly. This isn’t always as easy as it seems. As I mentioned in another article, issues such as causation vs. correlation make finding results sometimes very difficult.

So we can see that it is crucial to have a properly calibrated model. If we predict an event will occur 60% of the time, then in the future that is how often if needs to occur. The closer it occurs to 60% of the time, the more accurate our model will be. It is these probability numbers and payout numbers among others that we put into the Kelly formula to determine our money management. That’s why it’s so important to have correct numbers. Not only must we be right to accurately predict the outcome of these events, but we must also assign a monetary value to the event which reflects our confidence level. Wagering too much on an event because of an inaccurate winning percentage estimation will be disastrous.

Linear regression has already begun to be acknowledged by the mainstream press also. Brian Burke started the website Advanced NFL Stats for the sole purpose of analyzing the NFL from a statistical angle. Along the way he developed his own model for predicting winning percentages for each game. His site picked up enough popularity that his weekly predictions got posted online by the NYTimes.

His model is successful, but only predicts games straight up and would be hard to use to make money from long term. However, what it does do is show the possibilities that are available for statistical analysis.

Being a long-term, winning sports handicapper requires the same attention to detail and discipline as any other profession. The fact is that most handicappers are unable or unwilling to put in the time it takes. That’s why so many so-called experts fall short. They have wild fluctuations from year to year and can’t turn a profit against the spread. As the famous boxing trainer Freddie Roach likes to say, “It ain’t easy”.