1. Welcome! Please take a few seconds to create your free account to post threads, make some friends, remove a few ads while surfing and much more. ClutchFans has been bringing fans together to talk Houston Sports since 1996. Join us!

What factors are most predictive of a game's outcome for the Rockets? A multiple/logistic regression

Discussion in 'Houston Rockets: Game Action & Roster Moves' started by hollywoodMarine, Feb 16, 2014.

  1. heypartner

    heypartner Contributing Member

    Joined:
    Oct 27, 1999
    Messages:
    62,573
    Likes Received:
    56,308
    Fun post. I'm on a phone, so haven't read it all yet. But wanted to stop regarding your first example with the wizards game

    So when the real games don't match the stats, you explain that away as "luck"? Did your professor teach you that. :)
     
  2. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Hi, thanks! I just facepalmed because I derped real bad on this one. Basically when I ran logistic regression initially (with lower amounts of variables, like around 4) the significance of coefficients were more like how you would expect them to be (not f*ed up like 1, and .9999 as you see in this table). As I added more and more variables, the omnibus test of model coefficients remained below alpha which suggested that the model was still significant. The problem I had totally forgot about is that when too many variables are added in a logistic regression analysis (esp with a smallish 50+ sample size of games like in this case), the model can remain significant while the coefficients become insignificant. This is why, like I said, SPSS was still able to predict 100% accuracy of the outcome between wins and losses.

    Anyways I reran logistic regression with fewer variables (took out OREB's and FT's) and got a much better result:[​IMG]

    The only reason I talked about VIF and multicollinearity was because it was relevant to how to properly interpret OppOREB (I said that its effect was probably inflated, which I felt ppl here in this forum may have been interested in). Given that this post had already gotten bit too technical I felt that including information about things like linearity, homoscedasticity, and normality assumptions would have been unnecessary.

    But if you truly are curious, they are satisfied (for the most part). Breusch-pagan test of heteroscedasticity was 4.868, p =.9623, KS test of normality on residuals had p = .200, and in terms of linearity, I eyeballed all scatter-plots with individual IV's and the DV (they all appeared linear with the exception of PF's for both sides and OppFTP's which were too scattered to really detect the relationship was, but I chose to include them anyway, and I highly doubt that would have messed up the model or anything)
     
  3. wizkid83

    wizkid83 Contributing Member

    Joined:
    May 20, 2002
    Messages:
    6,335
    Likes Received:
    847
    Have you had a chance to run a model with my suggested changes to FG% variables? I really think you'd draw the wrong conclusion without normalizing for competitors performance/ Otherwise you might just be saying the most important factor of Rocket's winning or losing is the quality of the opponents faced rather than what's actually controllable by the Rockets.
     
  4. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    [​IMG]

    Updated conclusions :grin:

    New order of magnitude of effect (in terms of predicting wins vs losses)

    ThreepointFGP (3.932)
    OppTwosFGP (-3.649)
    OppThreesFGP (-3.52)
    TwosFGP (3.398)
    Turnovers (-1.471)
    Steals (1.054)

    I had to remove a couple variables because logistic regression model for this sample size of games simply could not be properly created with more than 6 IV's.

    According these stats, the main thing Rockets should improve on is three point shot, followed by defense, followed by maintain good two point FG%, followed by cutting down on turnovers, followed by increasing steals (although given the high significance -p value of .177 I don't think steals are a good predictor of outcome). TO's also have a high p value of .107 here... which may suggest it is not consistently predictive of outcome either. This is different from the multiple regression model, in which steals and TO's are significant predictors of point differential.


    In terms of the multiple regression model, as Carl Herrera pointed out, OREB% is much better indicator of performance. The new multiple regression model using that variable instead shows that OREB for both sides are insignificant... but the importance of the other factors, defense, three-point shooting, turnovers, remain more or less the same.
     
  5. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Hi, I'm still reading everyone's posts. I'll get to your suggestions shortly :)
     
  6. Noob Cake

    Noob Cake Member

    Joined:
    Mar 10, 2008
    Messages:
    3,541
    Likes Received:
    699
    Looks much better. The omnibus f-test is pretty much "garbage" in this case since the alternative is that at least one of the beta's is not equal to 0.

    Since you seem to have time and are interest in doing regression.

    1. Run an exhaustive MLR fit with selection by BIC while ADDING in an indicator for home court.
    2. Estimate the density of your predictor variables for each of the 30 teams (ie 30 teams * 12 significant predictor). This is a lot of data collection. Not sure where you are sourcing your data from. If you have the data in any raw form, ie scraped or downloaded, I can help you reformat and clean it.
    3. Essentially do a Monte Carlo simulation next, simulate N samples and plug into your MLR model to estimate the point spread, % win (ie without having to resort to logistric regression, if my understanding is correct, you are getting either 0 or 1 inverse-logit transformed p-values.)
    4. You have the start of a NBA prediction system.

    If you are doing all of this, you should probably move away from SPSS and use something more numerical like R/Python/Matlab (Octave).
     
  7. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Sorry for the late reply. Just wanted to say that is an excellent point. Although the hypothetical example is not actually what is happening (opponent FG% and rockets FG% have roughly the same standard deviation and thus consistency), I understand your general logic and concern.

    Since opponent FG% and rockets FG% are roughly the same consistency, I will use 2point FG% and 3poingFG% example instead (those two differ in standard deviation, 1 SD for the 2point FG% is 6 percent, while for the 3point FG% 1 SD is 10 percent). You are right that the standardized effect of 3 point shooting may look inflated because it varies so much. So why bother identifying standardized coefficients?

    I guess a way to look at it is that SD's tell you what is likely and what is unlikely... for example anything within 1 SD of the center of a distribution (the middle of the bell curve) is likely (almost 70%), while going past that in either direction you begin to enter the unlikely territory. "Likelihood" may also suggest how "hard" it may be to improve a given amount (this is MY interpretation). If the Rockets only had to improve let's say 0.15 SD in 2 point shooting, it should not be too difficult. They are currently around 54%, and would just need to improve it by 1% to 55%. Should not be too difficult, especially since they have probably achieved that before in quite a few games. However, if they were to improve by 2 SD, that would be like improving 12% to 66%, which, because of how rare such shooting performances were in the past (how far away from the center of the bell curve), we assume it is much more difficult. So the assumption here is that the greater the SD in terms of how much you want to improve, the more difficult or unlikely.

    We already know from the standardized coefficients that improving by 1 SD for 3 point shooting gives the Rockets a greater positive impact than improving by 1 SD in 2 point shooting. For the Rockets to get the same amount of positive impact from their 2 point shooting as 3 point shooting then, they would need to improve their 2 point shooting percentage by MORE than 1 SD.. to a point where their new average performance would represent more rare occurrences in the past, compared to improving "only" 1 SD above the current average for 3 point shooting (which statistically would represent less rare occurrences in the past). Even if the actual percentage increase required to improve by more than 1 SD for 2 point shooting is not as much as the amount required to improve by 1 SD in 3 point shooting, we still assume it is more difficult to improve the 2 point shooting by > 1 SD, because being at a point where you are > 1 SD above average is more rare than being at exactly 1 SD above average.

    I don't know if I am making sense, and the assumption that improving 1 SD in one area is as hard as improving 1 SD in another area is probably oversimplified and not completely correct either. I guess the question really comes down to, do you think it would require the same amount of effort for the Rockets to improve their 2 point shooting 1 SD (6%) from 54% to 60% as for them to improve their 3 point shooting by 1 SD (10%) from 35% to 45%? If you do, then the model still holds, and it would still be much more worth it to focus on 3 point shooting as the area for improvement. If you don't, then this model is useless :p
     
  8. Clarinetmonster

    Joined:
    Jan 21, 2014
    Messages:
    1,339
    Likes Received:
    26
    Didn't read yet, but just for fun used find function to find out that in your post, the key Rox player's are mentioned this many times:
    Harden - 2
    Dwight - 2
    Bev - 2
    Jones - 0
    Parsons - 0
    Lin - 9
     
  9. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    That is a great idea, but where can I get such stats? (that is, defensive FG% delta and offensive FG% delta as broken down by game?)

    I believe the four factors are included in the model (shooting, turnovers, rebounding, and free throws). As you said, FT may not have as much of an effect on point differential, but should have an effect on wins/losses. Logistic regression was done to investigate that but for some reason our free throws had very little effect, even in my later re-test (the first one was fail lol).

    Also, splitting up the shooting section of the 4 factors allowed for examination of which was more important for Rockets winning, 3 point shooting or 2 point shooting. Data suggests maybe 3 point shooting is more important for this team. Thx for the heads up on the 4 factors tho, gonna take a look at (and the weight assigned to each of the 4) later this week I get the chance


    There is definitely some bit of collinearity (would have been even worse if I included assists and blocks), but my understanding was that VIF below 5 is acceptable. Is that cut off too liberal?

    Sure. Just look at the second table in the post, look at the numbers that are circled, the greater the absolute value, the more "important" the corresponding part of the box score (in the first column) is in predicting point differential in a hypothetical game. Negative = bad, positive = good

    Opponent two point shooting is most important and very bad. Rockets' three point shooting is 2nd most important and very good. Outside of shooting, turnovers are the most important and very bad. Steals are good but do not make up for bad effects of turnovers.

    Ignore logistic regression model, it is f*ed up. The new one is at the top of the second page (with summary in that post) :)

    I was just trying to be optimistic :grin:

    That's because I had a long paragraph where I ranted about his TO's :p

    Thanks for pointing that out, don't know why I forgot to include that. R square for the multiple regression was .888, p<.001, and the significance for the new logistic regression model I believe is on the relevant post
     
  10. durvasa

    durvasa Contributing Member

    Joined:
    Feb 11, 2006
    Messages:
    37,999
    Likes Received:
    15,462
    That's just team game log together with the season team statistics , right?
     
  11. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Wow, I had been trying to find out how to get team stats like that broken down by game for awhile now, and just could not figure it out. This is awesome, thx

    I realized there is a slight problem with the original suggestion of using delta FG% for opponents (their FG% minus their season average) and delta FG% for Rockets (Rockets' FG% minus Rockets' season average) to determine Rockets' level of offense or defense in a way that accounts for quality of opponents. That only accounts for quality of opponents on their offensive end, but not on their defensive end. Subtracting Rockets' FG% for each game from the Rockets' season average does not account for the opponents' defense because Rockets' season average remains constant at every game, and really all you're doing is just shifting the FG% distribution to the left (so that the center is at zero), but the variance and SD is unchanged.

    To determine how good Rockets' offense is in a way that accounts for variance in the quality of opponents' defense, the Rockets' FG% for each game must be subtracted from the opponents' opponent FG% . If I leave that part out, and only account for the fluctuation in opponents' offense in this model, then the defense variables (opponent FG%) magnitude of effect would be "unfairly" reduced compared to the offense variables (Rockets' FG%) in this model. If I were to also account for the fluctuation in quality of opponents in terms of defense, that would balance things out between the effects of Rockets' and opponent FG% variables, but now their effects are unfairly reduced as a whole compared to turnovers and steals and other variables (because the quality of opponents can also fluctuate in terms of how well they take care of the ball and how good they are at making steals, and rebounding etc.).

    Basically, it comes down to your initial concern, in which you asked "wouldn't the opponent FG% just be more a factor of the quality of opponents offense rather than anything in Rocket's control?" Because the answer is a partial yes, but then so is Rockets' FG%, Rockets rebounds, steals, turnovers, all these other things on the box score which are partially affected by quality of opponents as well. In a way, it actually balances out because all these things in the model, besides FT's, are affected by quality of opponents

    Of course, opponent quality probably does not affect these things equally- obviously different teams have different strengths and weaknesses (e.g., one team may have great offense but poor defense). So the best model would account for ALL of these aspects of opponent quality fluctuation... but a model that does not account for any of them is, IMO, still decent (at least better than a model that only partially accounts for opponent quality)

    In any case, I can either make a more complicated model that accounts for all of that, or not do anything.

    And... frankly, I think I've done enough stats for one day :p

    PS: I will remember your suggestion and finish a better regression model next time I have a huge chunk of free time :cool:
     
  12. glimmertwins

    glimmertwins Member

    Joined:
    Jun 26, 2006
    Messages:
    5,914
    Likes Received:
    4,242
    That's a ton of statistical evidence to point out what I thought was fairly obvious about this team already.

    ...not criticizing, I love the stats too but just had a little chuckle to myself.
     
  13. Old Man Rock

    Old Man Rock Contributing Member

    Joined:
    Oct 23, 1999
    Messages:
    7,157
    Likes Received:
    518
    I love the effort but in the words of McHale, "all that analytics does is tell me stuff I already knew."

    Great effort though you lost me early on. ;)
     
  14. burlesk

    burlesk Serious business
    Supporting Member

    Joined:
    Jul 1, 2001
    Messages:
    1,958
    Likes Received:
    2,166
    Warning: This post adds nothing of substance to this discussion

    Warning: This post adds nothing of substance to this discussion.

    Suggestion: a Statistical Analysis section for the bbs...

    ... solely so that I, burlesk, know where not to go if I want to avoid feeling stupider than I normally do.

    JK -- though a section like that does seem like it might be kinda cool.

    It's kinda funny, too, because I'm pretty good in most areas of math, but have always had a huge hole in my brain where statistics and probability should go. My brain goes into emergency shutdown mode almost immediately when faced with posts like this. :confused:

    I don't object to the existence of advanced stats; I enjoy reading popular treatments on what such things can tell us (really enjoyed reading The Drunkard's Walk by Leonard Mlodinow, for a tangential example); I just don't want to see the inner workings. Kind of a don't ask, don't tell thing...

    Seriously, though, it's truly impressive stuff, and I hope maybe you guys will inspire me to tackle it again someday...
     
  15. krmclaughlin

    krmclaughlin Member

    Joined:
    Apr 9, 2010
    Messages:
    605
    Likes Received:
    219
    Me after reading this post:

    [​IMG]
     
    2 people like this.
  16. New

    New Member

    Joined:
    Jan 5, 2013
    Messages:
    902
    Likes Received:
    18
    Did you use all the games to do the regression. Can you show your R2 values and also add a 2D plot of predicted vs actual. It is important to assess the predictive power of your model first before we draw any conclusion.
     
  17. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Here ya go

    [​IMG]

    And yea someone else pointed out I had forgotten to include R2 and p values (facepalm!). For multiple regression it is R square was .888, p<.001. For logistic regression, see earlier post beginning page 2 (I had to re-test because the first one had too many IV's).

    Yes all games were included. Do note someone earlier suggested using OREB% rather than OREB, and that did change some things (namely oppOREB no longer significant)

    No problem. :) I'm not so intuitive when it comes to actual common sense basketball knowledge, so I need numbers to help me out a little bit :grin:

    Although, even though we all know defense is important, as is getting the 3's to fall, and cutting down on TO's, etc., if the regression models can be polished up some more they have the potential to tell us which is more important than the others, which can be insightful

    (currently one model suggests defending and lowering opponent 2 pnt FG% is most important, while the other suggests improving the Rockets' 3 point shot is most important)

    That's my bad. Maybe went a little overboard on the technical stuff for this post lol..
     
  18. burlesk

    burlesk Serious business
    Supporting Member

    Joined:
    Jul 1, 2001
    Messages:
    1,958
    Likes Received:
    2,166
    Naw, hM, it's not you -- I just have some weird but serious block in my brain about statistics. ANY statistical analysis at any depth causes me to mentally melt down. It's kind of why I can't play chess, either... I'm a fairly smart feller in many ways, though...
    [​IMG]
     
  19. wizkid83

    wizkid83 Contributing Member

    Joined:
    May 20, 2002
    Messages:
    6,335
    Likes Received:
    847
    Yeah had a total brain fart moment. You should be using opponent's FG% and our FG% at each game vs. Opponents offensive and defensive FG% as a prediction.

    Also, instead of using 3 pt % and 2 pt%, why don't just used TS% which adjust for all for all of that in one variable?

    TS%={PTS*100}/{(2*(FGA+0.44*FTA))}
     
  20. FV Santiago

    FV Santiago Member

    Joined:
    Apr 22, 2011
    Messages:
    434
    Likes Received:
    62
    I am a big fan of multiple regression and backward regression models and used them extensively (StatTools) when I was involved in gambling on sports. When it comes to predicting over/unders on NBA games I came to the same conclusion as you -- the single most important independent variable was always related to defense. This creates a lot of betting opportunities because the public at large is trained to look for offense in an NBA game. So if you pit the Steve Nash Phoenix Suns against the Ben Wallace Detroit Pistons, it will consistently be the Pistons' defense that has the bigger input into the pace of play and overall points scored.

    The hole however with regression analysis in sports is that you can't normalize your sample for injuries, lineup changes and trades. Back-to-backs also impact performance, as do psychological things like coming off a big win or being on a streak. That said, it's still a great tool and creates a lot of interesting insights.
     

Share This Page

  • About ClutchFans

    Since 1996, ClutchFans has been loud and proud covering the Houston Rockets, helping set an industry standard for team fan sites. The forums have been a home for Houston sports fans as well as basketball fanatics around the globe.

  • Support ClutchFans!

    If you find that ClutchFans is a valuable resource for you, please consider becoming a Supporting Member. Supporting Members can upload photos and attachments directly to their posts, customize their user title and more. Gold Supporters see zero ads!


    Upgrade Now