1. Welcome! Please take a few seconds to create your free account to post threads, make some friends, remove a few ads while surfing and much more. ClutchFans has been bringing fans together to talk Houston Sports since 1996. Join us!

What factors are most predictive of a game's outcome for the Rockets? A multiple/logistic regression

Discussion in 'Houston Rockets: Game Action & Roster Moves' started by hollywoodMarine, Feb 16, 2014.

  1. Ashes

    Ashes Member

    Joined:
    Jul 8, 2007
    Messages:
    2,787
    Likes Received:
    75
    So the key to success is playing better defense, shooting better, and minimizing turnovers. Who would've thought?

    Cool post, though.
     
  2. jtr

    jtr Contributing Member

    Joined:
    Dec 4, 2011
    Messages:
    7,470
    Likes Received:
    275
  3. durvasa

    durvasa Contributing Member

    Joined:
    Feb 11, 2006
    Messages:
    37,999
    Likes Received:
    15,462
    This may be a stupid question, but are you using in-game statistics as the independent variables, or the team and opponent season statistics? That wasn't clear to me. If the former, then its of course trivial to develop a 100% predictive model for whether the team will win or not (just points scored and points allowed).
     
  4. wizkid83

    wizkid83 Contributing Member

    Joined:
    May 20, 2002
    Messages:
    6,335
    Likes Received:
    847
    Trade and injuries is definitely harder to normalize, but back to should be pretty easy to use as an independent variable on a game.
     
  5. wizkid83

    wizkid83 Contributing Member

    Joined:
    May 20, 2002
    Messages:
    6,335
    Likes Received:
    847


    Yeah, which is why I think we should use opponent's TS% and our TS% at each game vs. Opponents avg offensive and defensive TS%.
     
  6. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Got to run to class soon so this will b short (may miss somethings)

    Yea I had mentioned how TS% model is good but I wanted to look at how much impact 3 point shooting had on the games' outcomes (and FT's etc.), and not just defense vs offense (hence why I chose to split it up)

    Yeah, and also home vs away. Because these are categorical , e.g., one category (home, back to back, etc.) vs another category (away, not back to back, etc.) rather than continuous variables (such as how many turnovers), these factors can only be added into a logistic regression model.

    I had considered doing that, but then as I found out later, logistic regression stops working very well if you have too many variables, and not enough actual samples (games) :(

    Looks pretty cool.. and complicated haha

    Np, these are in game statistics, and I can understand why it may appear kind of useless at first. But keep in mind these aren't points scored and points allowed (a model with those variables would like you say predict 100% but not really give you any information). The variables are more like: how well are you shooting twos, how well are you shooting threes, how well is your opponent shooting the twos/threes (so an estimate of your defense), and then other parts of the parts of the box score as well.

    The key thing is that the model is not intended to be predictive of future games (although it can give you idea maybe of some "fluke" past games, which I had listed in the post), but rather to answer questions about WHICH of the variables are more important in the prediction (is three point shooting more important than taking care of the ball? is defending the paint and midrange more important than defending the perimeter?)

    It's kind of like saying, we all know if Rockets score more points than opponent, then they win. But is it the 3 point shooting that is causing them to end up with more points? Or is it the decrease in turnovers? Or is it the better defense? (or if all three, which is most significant in causing them to score more points in a game?) This model was an attempt to answer some of those questions
     
  7. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Ah ok, if that's what he meant then I misunderstood.

    I still want to keep the shooting split up into 2's,3's FT's for the reasons I listed earlier. Later this week I'll write a python script to automate the subtracting our shooting from opponents opponent season average shooting (and vice versa), because doing it by hand would be way too time of a b*Tch
     
  8. wizkid83

    wizkid83 Contributing Member

    Joined:
    May 20, 2002
    Messages:
    6,335
    Likes Received:
    847
    Hmm, maybe then use TS% interacting with a percent of FGs 3pts variable?
     
  9. Benchwarmer

    Benchwarmer Member

    Joined:
    Jan 31, 2013
    Messages:
    1,652
    Likes Received:
    33
    The team that gets to the line and attempts the most FT's, usually wins the game. There are several reasons for this:

    1. It puts the other team into a foul situation which breaks down their defensive intensity.

    2. It increases the team's FG % by allowing shooters to get into a rhythm at the FT line.

    3. It slows down the opponent's tempo, so that they are forced to play a half court set with defenders already set.
     
  10. hotballa

    hotballa Contributing Member

    Joined:
    Dec 27, 2002
    Messages:
    12,516
    Likes Received:
    305
    paging Daryl Morey
     
  11. topfive

    topfive CF OG

    Joined:
    Jun 12, 2002
    Messages:
    19,038
    Likes Received:
    37,441
    Maybe hollywoodMarine IS Daryl Morey. :grin:
     
  12. crossover

    crossover Contributing Member

    Joined:
    Sep 13, 2001
    Messages:
    2,049
    Likes Received:
    799
    I love stats myself and am only average with them, but I just want to say the usefulness of the stats has to be taken in the correct context. I don't know if the math has issues (I was too lazy to go through it myself) but a lot of people may read the results as unique to the Rockets.

    I'm guessing the importance of the ranking isn't very useful as the results just show how FG%, oppFG%, and turnovers are inherently the most directly correlated stats to the generation or reduction of points per possession in a zero sum basketball game which should be expected (zero sum applies because comparison of points determine the winner of a game, generating more ppp and reducing your opponent's effective ppp have equal value, and there is a parity of limited resources (possessions in turns). This weighting of a player's total contribution to both teams' ppp is something many advanced stats take into account and attempt to do - for instance, PER.

    So someone might ask, isn't the OP's math at showing that the Rocket's team defense has more value than their offense? Another guess of mine is that too might not be the case. A single team's offensive deviation is going to remain more stable because you have the same guys shooting in the same systems. My guess is the impact of the opponent's defense is less than impact of facing teams with different avg ppp every game, especially over the course of a regular season. Saying defense wins championships may be true but probably not something that can be extended to postseason lightly from regular season conclusions.

    Let me try to say it another way with some mental math. There are roughly just under 100 possessions per side in an NBA game and 100 points per side, so 200 total possessions and 200 total points (which is roughly 1 ppp btw).

    (references: http://www.teamrankings.com/nba/team-stats/)
    Team Avg TS%: ~55% (composition mostly 2pters, but 3pters for high volume 3pt teams)
    Team Avg TO per game: 15 (each TO is ~2% comparison of possessions, the larger the margin is)
    Team Avg Stls per game" 7 (so about half of TOs)
    Team Avg Reb differential between+/- 5 reb (also around the same impact as TO)
    And then the idea that a team's offense is more regular and opposition offense is more non-regular.

    Given the above, I could quickly ballpark and rank, the impact of stats vs. ppp for any given team:

    Opp2FG%,2FG% ,Opp3FG%, 3FG% (all around +/- 0.55 and order depending on volume of 2pters vs. 3pters, but defensive metrics impact higher)
    TOV and oppTOV around +/- 0.30
    Steals around +/-0.15 (half of TOVs)
    Reb around +/- 0.11 depending on differential
    Other stuff that impacts ppp (PF * % of PF that result in FTs * FT% => FTs)

    Which is pretty much the ranking and values of ratios the OP found.

    My guess is all teams have the roughly same ranking of statistical impact on wins/losses and in roughly the same ratios. So the value of what the OP found is more exact as to how much more or less these things contribute to wins or losses for the Rocket's system. The ranking is expected. A good follow up would be to compare these to other teams or the league average. It is also the start of something like a team-specific PER stat, where you can match up and see if a certain player would fit the Rocket's system well.
     
    #52 crossover, Feb 17, 2014
    Last edited: Feb 17, 2014
  13. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Hi! I don't believe order would depend only on volume, but rather consistency as well. Let's say the Rockets consistently shoot a HIGH volume of 2's from game to game, but the opponent doesn't score any points at all. So let's say the Rockets score more or less 90 points all from 2's every game. The point differential would hover closely around 90 all the time. But let's say now they only made 3 3-point FGA's every game in addition to the 90 points from 2's, but the threes were wildly inconsistent. So one game their point differential would be 99 points (3/3 3-pointers), another game their point differential would be 90 points (0/3 3-pointers). This goes on long enough, and the model WILL place Rockets' 3 point FG% way at the top of the list of importance, despite them only making 3 3-point FGA's every game. This is because *that* variable is the thing making the most difference.

    You are right that volume also has to be taken into account, but that's what this model does, which is look at volume in combination with consistency, and then determine what variable seems to be making the most difference in terms of point differential. And as someone earlier already pointed out, the model shouldn't be interpreted as what is more important, but rather what is making the most difference (which would imply what area the Rockets should work on to improve the most, while maintaining other variables such as 2 point FG%, which is IMPORTANT but given its consistency, they need to simply maintain the status quo)

    TL;DR, order of shooting variables depends on more than what has higher volume, but rather what is making the most difference (which is a combination of volume and consistency)

    That was also suggested earlier, and is a good thought, but that is actually not the case (at least not to the degree that you are suggesting).

    Rockets' team FG% SD is 5.6, while Opponent's team SD is 5.65. To me that looks like a very small difference. Maybe that .05 in SD can result in that huge change, but it seems unlikely IMO

    That's why I had included a logistic regression analysis as well (that looks at contributions of variables to wins vs losses, not point differential) in an attempt to identify key variables that may not generate a lot of points perse but are significantly related to winning or losing -- although sadly it cannot include as many variables as the multiple regression. Still, my finding (not posted but if you want the data I can make it available) was that PF's and FT shooting was still not very related to winning or losing. The modified logistic regression also suggested that Rockets' 3 point shooting performance was MORE predictive of winning or losing than anything else. That finding may introduce some interesting questions given it is different from the stats you presented (I think you placed Rockets' 3FG% last place?), though the logistic regression model probably does have more flaws than the multiple regression model.
     
  14. TheEarthBlues

    TheEarthBlues Member

    Joined:
    Feb 17, 2014
    Messages:
    57
    Likes Received:
    8
    Great post, very informative.
     
  15. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    Btw, I totally agree on this point. There is the possibility that maybe for every team, the order is the same. But I do feel that the Rockets' inconsistencies in 3 point shooting and defense (and maybe TO's? Or maybe they are consistently bad with TO's lol) is kind of unique, so I also wouldn't be surprised if this order of importance is not as universally shared, or maybe only shared by a couple teams, including Rockets.

    That'll be a good thing to try out next time. What would you suggest is a good "baseline" team that I can compare the Rockets to?
     
  16. JCDenton

    JCDenton Member

    Joined:
    Oct 10, 2007
    Messages:
    1,090
    Likes Received:
    261
    I thought from the last incident that it was commonly understood that amateur statisticians would need to submit their work to me prior to posting for peer review?

    This "analysis" is the poster child for why that step is necessary. Where is the detail? Where are the graphs? A regression analysis without graphs? Is this even real life?

    Tread lightly or be vanquished like my past rivals.
     
  17. hollywoodMarine

    Joined:
    Jan 15, 2014
    Messages:
    246
    Likes Received:
    32
    If you would clarify, to what incident specifically are you referring?

    And what detail specifically is missing?

    Graphs? You mean the scatterplot of standardized predicted value and point differential found on this page?
     
  18. crossover

    crossover Contributing Member

    Joined:
    Sep 13, 2001
    Messages:
    2,049
    Likes Received:
    799
    Those were the things that jumped out to me at first. Comparing to a league average or top performing teams (MIA, SAS, OKC, etc.) would be a natural direction. In my interpretation, your results more directly describe what kind of system a team runs (which then can be compared to win/loss), although I'll be honest again in that I only skimmed the work. It would be a great way to forecast team strategy vs. other teams head-to-head (ie. Detroit Pistons rely on 2pt% to generate wins, so usage of Dwight + Omer is key this game) and also see if a FA or trade prospect plays to the Rocket's systemic strengths/weaknesses (ie. Ryan Anderson scores a 85% compatibility (per dollar analysis could be easily tacked on here) as his strong points correlate well with Rocket system impact areas vs. average of 75% for next best option). Definitely a lot of cool extensions you could go from here!
     
    #58 crossover, Feb 17, 2014
    Last edited: Feb 17, 2014
  19. Qball

    Qball Contributing Member

    Joined:
    Nov 9, 2001
    Messages:
    4,151
    Likes Received:
    210
    But do your fancy pants equations take into account....

    <iframe width="420" height="315" src="//www.youtube.com/embed/5-1jgNhopNo" frameborder="0" allowfullscreen></iframe>

    ....the heart of a champion?
     
  20. WtsZeOdds

    WtsZeOdds Member

    Joined:
    Feb 18, 2014
    Messages:
    34
    Likes Received:
    2
    I am new to this forum I have lived in the Houston area my whole life! I'm usually just creeping on this forum but after reading your analysis I decided I would comment! I have multiple bachelor degrees and really enjoy statistics they are even more enjoyable to do when you get paid to run statistical analysis for companies on the side (like me ;) ). I just wanted to point out this is a very good start and I am impressed on the data you were able to collect and run. It is always good in my personal opinion to run statistical methods on topics you truly enjoy and understand. It makes the statistical process be come much easier and more fluid! The best information I was given while learning stats and doing my own beginning research analysis is no matter what the hypothesis you are trying to research always attempt to prove your hypothesis wrong along the way. When you do that then guys like me who go in to disprove your theories have a much harder and more difficult time. Meanwhile you will find that the statistical methods and analysis you are trying to use to disprove the hypothesis will end up proving the hypothesis!... However, you could completely disprove your hypothesis and end up throwing out all the research because the original hypothesis had flaw in the design... and you will have to start all over from scratch.... which sucks! lol However, if you enjoy stats just stick with it and you will found out some very interesting facts over MANY different topics.
     
    2 people like this.

Share This Page

  • About ClutchFans

    Since 1996, ClutchFans has been loud and proud covering the Houston Rockets, helping set an industry standard for team fan sites. The forums have been a home for Houston sports fans as well as basketball fanatics around the globe.

  • Support ClutchFans!

    If you find that ClutchFans is a valuable resource for you, please consider becoming a Supporting Member. Supporting Members can upload photos and attachments directly to their posts, customize their user title and more. Gold Supporters see zero ads!


    Upgrade Now