class: center, middle, inverse, title-slide # Statistical Thinking using Randomisation and Simulation ## Generalised Linear Models ### Di Cook University ### W6.C2 --- # Generalised linear models - Overview - Types - Assumptions - Fitting - Examples --- # Overview - GLMs are a broad class of models for fitting different types of response variables distributions. - The multiple linear regression model is a special case. --- # Three components - Random Component: probability distribution of the response variable - Systematic Component: explanatory variables - Link function: describes the relaionship between the random and systematic components --- # Multiple linear regression `$$y_i = \beta_0+\beta_1x_1 + \beta_2x_2 + \varepsilon ~~~ or ~~~ E(Y_i)=\beta_0+\beta_1x_1+\beta_2x_2$$` - Random component: `\(y_i\)` has a normal distribution, and so `\(e_i \sim N(0,\sigma^2)\)` - Systematic component: `\(\beta_0+\beta_1x_1 + \beta_2x_2\)` - Link function: identity, just the systematic component --- # Poisson regression `$$y_i = exp(\beta_0+\beta_1x_1+\beta_2x_2) + \varepsilon$$` - `\(y_i\)` takes integer values, 0, 1, 2, ... - Link function: `\(ln(\mu)\)`, name=`log`. (Think of `\(\mu\)` as `\(\hat{y}\)`.) --- # Bernouilli, binomial regression `$$y_i = \frac{exp(\beta_0+\beta_1x_1+\beta_2x_2)}{1+exp(\beta_0+\beta_1x_1+\beta_2x_2)} + \varepsilon$$` - `\(y_i\)` takes integer values, `\(\{ 0, 1\}\)` (bernouilli), `\(\{ 0, 1, ..., n\}\)` (binomial) - Let `\(\mu=\frac{exp(\beta_0+\beta_1x_1+\beta_2x_2)}{1+exp(\beta_0+\beta_1x_1+\beta_2x_2)}\)`, link function is `\(ln\frac{\mu}{1-\mu}\)`, name=`logit` --- # Assumptions - The data `\(y_1, y_2, ..., y_n\)` are independently distributed, i.e., cases are independent. - The dependent variable `\(y_i\)` does NOT need to be normally distributed, but it typically assumes a distribution from an exponential family (e.g. binomial, Poisson, multinomial, normal,...) - Linear relationship between the transformed response (see examples below) - Explanatory variables can be transformations of original variables - Homogeneity of variance does NOT need to be satisfied for original units, but it should be still true on the transformed response scale - Uses maximum likelihood estimation (MLE) to estimate the parameters - Goodness-of-fit measures rely on sufficiently large samples --- # Example: Olympics medal tally - Model medal counts on log_GDP - Medal counts = integer, which suggests using a Poisson model. <img src="week6.class2_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- # Model fit and what it looks like ```r oly_glm <- glm(M2012~GDP_log, data=oly_gdp2012, family=poisson(link=log)) summary(oly_glm)$coefficients #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) -13.2 0.538 -24 3.6e-132 #> GDP_log 1.3 0.045 30 6.8e-198 ``` <img src="week6.class2_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- class: inverse middle # Your turn Write down the formula of the fitted model. -- `$$\hat{log(M2012)} = -13.2 +1.3 GDP.log$$` --- # Model fit ``` #> #> Call: #> glm(formula = M2012 ~ GDP_log, family = poisson(link = log), #> data = oly_gdp2012) #> #> Deviance Residuals: #> Min 1Q Median 3Q Max #> -4.80 -2.22 -0.36 1.07 8.55 #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) -13.1691 0.5383 -24.5 <2e-16 *** #> GDP_log 1.3406 0.0447 30.0 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for poisson family taken to be 1) #> #> Null deviance: 1567.70 on 84 degrees of freedom #> Residual deviance: 545.92 on 83 degrees of freedom #> AIC: 845.7 #> #> Number of Fisher Scoring iterations: 5 ``` The difference between the null and residual deviance is substantial, suggesting a good fit. --- # Residual plots ![](week6.class2_files/figure-html/unnamed-chunk-5-1.png)<!-- --> Heteroskedasticity in residuals. One fairly large residual. --- # Influence statistics ``` #> .rownames .cooksd .resid #> 1 RussianFed 1.9e+00 8.553 #> 2 China 1.5e+00 3.743 #> 3 UnitedStates 8.3e-01 1.468 #> 4 GreatBritain 8.0e-01 5.232 #> 5 Jamaica 4.4e-01 5.267 #> 6 India 2.6e-01 -4.800 #> 7 Japan 2.5e-01 -2.010 #> 8 Cuba 2.4e-01 4.215 #> 9 Ukraine 2.3e-01 4.270 #> 10 Kenya 1.9e-01 3.802 #> 11 Belarus 1.6e-01 3.535 #> 12 Hungary 1.5e-01 3.621 #> 13 Brazil 1.5e-01 -2.862 #> 14 Georgia 1.3e-01 3.219 #> 15 Indonesia 1.2e-01 -4.563 #> 16 Mexico 9.8e-02 -3.444 #> 17 SaudiArabia 9.2e-02 -4.388 #> 18 Australia 7.6e-02 2.211 #> 19 Azerbaijan 7.5e-02 2.584 #> 20 Mongolia 7.3e-02 2.612 #> 21 ChineseTaipei 7.0e-02 -3.680 #> 22 Turkey 6.5e-02 -3.179 #> 23 Switzerland 6.5e-02 -3.293 #> 24 Ethiopia 6.2e-02 2.385 #> 25 Belgium 6.0e-02 -3.294 #> 26 Venezuela 5.8e-02 -3.498 #> 27 NewZealand 5.0e-02 2.211 #> 28 HongKongChina 4.9e-02 -3.191 #> 29 Portugal 4.9e-02 -3.164 #> 30 Greece 4.5e-02 -2.932 #> 31 Kazakhstan 4.4e-02 2.100 #> 32 Norway 4.3e-02 -2.700 #> 33 DPRKorea 4.2e-02 2.020 #> 34 Algeria 4.0e-02 -2.815 #> 35 Singapore 3.9e-02 -2.705 #> 36 Argentina 3.8e-02 -2.534 #> 37 Kuwait 3.8e-02 -2.731 #> 38 Thailand 3.7e-02 -2.566 #> 39 Malaysia 3.7e-02 -2.602 #> 40 Canada 3.6e-02 -1.607 #> 41 Egypt 3.4e-02 -2.512 #> 42 Korea 3.3e-02 1.635 #> 43 Finland 2.9e-02 -2.222 #> 44 Spain 2.6e-02 -1.463 #> 45 Qatar 2.6e-02 -2.126 #> 46 Morocco 2.4e-02 -2.147 #> 47 Germany 2.1e-02 0.754 #> 48 SouthAfrica 1.9e-02 -1.705 #> 49 Sweden 1.8e-02 -1.586 #> 50 Armenia 1.4e-02 1.291 #> 51 TrinidadTobago 1.4e-02 1.234 #> 52 PuertoRico 1.2e-02 -1.390 #> 53 Guatemala 1.1e-02 -1.396 #> 54 Croatia 1.1e-02 1.073 #> 55 Lithuania 1.0e-02 1.072 #> 56 Ireland 7.5e-03 -1.044 #> 57 CzechRepublic 5.5e-03 0.804 #> 58 Grenada 5.2e-03 1.025 #> 59 Netherlands 5.2e-03 0.726 #> 60 Poland 5.0e-03 -0.817 #> 61 Rep.ofMoldova 4.9e-03 0.827 #> 62 Romania 4.8e-03 0.750 #> 63 Bahrain 4.8e-03 -0.925 #> 64 Cyprus 4.6e-03 -0.904 #> 65 DominicanRep. 4.4e-03 -0.822 #> 66 Bulgaria 4.4e-03 -0.820 #> 67 Uzbekistan 2.6e-03 0.555 #> 68 Serbia 2.5e-03 0.550 #> 69 Afghanistan 2.3e-03 -0.637 #> 70 Colombia 2.0e-03 -0.518 #> 71 Gabon 1.9e-03 -0.588 #> 72 Botswana 1.9e-03 -0.575 #> 73 Italy 1.8e-03 -0.304 #> 74 Uganda 1.7e-03 -0.558 #> 75 Slovenia 1.1e-03 0.359 #> 76 Slovakia 9.6e-04 -0.360 #> 77 Denmark 8.2e-04 -0.330 #> 78 Montenegro 3.6e-04 0.257 #> 79 Latvia 2.4e-04 -0.189 #> 80 Tunisia 8.8e-05 -0.109 #> 81 Bahamas 7.5e-05 -0.116 #> 82 France 4.7e-05 0.042 #> 83 Iran 3.5e-06 -0.021 #> 84 Estonia 3.5e-06 -0.022 #> 85 Tajikistan 8.3e-07 -0.012 ``` Largest Cooks D values enough to have some concerns about the influence that Russian Federation and China have on the model fit. Should re-fit without these two cases. --- # Prediction from the model ```r aus <- oly_gdp2012 %>% filter(Code == "AUS") predict(oly_glm, aus) #> 1 #> 3.2 ``` WAIT! What??? Australia earned more than 3 medals in 2012. Either the model is terrible, or we've made a mistake! -- ```r aus <- oly_gdp2012 %>% filter(Code == "AUS") predict(oly_glm, aus, type="response") #> 1 #> 23 ``` -- Need to transform predictions into original units. --- # Example: winning tennis matches We have data scraped from the web sites of the 2012 Grand Slam tennis tournaments. There are a lot of statistics on matches. Below we have the number of receiving points won, and whether the match was won or not. <img src="week6.class2_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- class: inverse middle # Your turn The response variable is binary. What type of GLM should be fit? -- *bernouilli/binomial* --- # Model ```r tennis_glm <- glm(won~Receiving.Points.Won, data=tennis, family=binomial(link='logit')) ``` ``` #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) -2.91 0.586 -5.0 7.1e-07 #> Receiving.Points.Won 0.11 0.015 7.3 3.0e-13 ``` <img src="week6.class2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- class: inverse middle # Your turn Write down the fitted model -- *Let* `$$u=exp(-2.91+0.11RPW)$$` *then* $$ \hat{won}=\frac{u}{1+u} $$ --- # Model fit ``` #> #> Call: #> glm(formula = won ~ Receiving.Points.Won, family = binomial(link = "logit"), #> data = tennis) #> #> Deviance Residuals: #> Min 1Q Median 3Q Max #> -2.506 0.227 0.411 0.624 1.877 #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) -2.9053 0.5860 -4.96 7.1e-07 *** #> Receiving.Points.Won 0.1111 0.0152 7.29 3.0e-13 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for binomial family taken to be 1) #> #> Null deviance: 472.99 on 511 degrees of freedom #> Residual deviance: 402.16 on 510 degrees of freedom #> AIC: 406.2 #> #> Number of Fisher Scoring iterations: 5 ``` Not much difference between null and residual deviance, suggests return points won does not explain much of the match result. --- # Residuals ![](week6.class2_files/figure-html/unnamed-chunk-14-1.png)<!-- --> Model is just not capturing the data very well. There are two groups of residuals, its overfitting a chunk and underfitting chunks of data. --- # Influence statistics ``` #> .cooksd .resid #> 1 6.0e-02 1.877 #> 2 3.6e-02 -2.505 #> 3 2.9e-02 -2.420 #> 4 2.4e-02 1.528 #> 5 2.0e-02 -2.287 #> 6 1.7e-02 -2.242 #> 7 1.7e-02 -2.242 #> 8 1.5e-02 -2.196 #> 9 1.3e-02 -2.149 #> 10 1.2e-02 1.329 #> 11 1.2e-02 1.329 #> 12 1.1e-02 -2.103 #> 13 1.1e-02 -2.103 #> 14 1.1e-02 -2.103 #> 15 9.9e-03 -2.055 #> 16 9.9e-03 -2.055 #> 17 9.9e-03 -2.055 #> 18 9.9e-03 -2.055 #> 19 9.4e-03 1.280 #> 20 9.4e-03 1.280 #> 21 9.4e-03 1.280 #> 22 9.4e-03 1.280 #> 23 8.6e-03 -2.008 #> 24 7.6e-03 1.232 #> 25 7.6e-03 1.232 #> 26 7.5e-03 -1.959 #> 27 7.5e-03 -1.959 #> 28 7.5e-03 -1.959 #> 29 7.5e-03 -1.959 #> 30 7.5e-03 -1.959 #> 31 7.5e-03 -1.959 #> 32 6.6e-03 -1.911 #> 33 6.6e-03 -1.911 #> 34 5.9e-03 -1.124 #> 35 5.9e-03 -1.170 #> 36 5.9e-03 -1.862 #> 37 5.9e-03 -1.862 #> 38 5.9e-03 -1.078 #> 39 5.9e-03 -1.078 #> 40 5.7e-03 -1.266 #> 41 5.7e-03 -1.266 #> 42 5.6e-03 -1.315 #> 43 5.6e-03 -1.315 #> 44 5.6e-03 -1.315 #> 45 5.6e-03 -1.315 #> 46 5.6e-03 -1.315 #> 47 5.6e-03 -1.315 #> 48 5.6e-03 -0.989 #> 49 5.4e-03 -1.364 #> 50 5.4e-03 -1.364 #> 51 5.4e-03 -1.364 #> 52 5.4e-03 -1.364 #> 53 5.4e-03 -0.946 #> 54 5.3e-03 -1.813 #> 55 5.3e-03 -1.813 #> 56 5.3e-03 -1.813 #> 57 5.2e-03 -1.413 #> 58 5.2e-03 -1.413 #> 59 5.2e-03 -1.413 #> 60 5.2e-03 -1.413 #> 61 5.2e-03 -1.413 #> 62 5.2e-03 -1.413 #> 63 5.2e-03 -1.413 #> 64 5.0e-03 -1.463 #> 65 5.0e-03 -1.463 #> 66 5.0e-03 -1.463 #> 67 5.0e-03 -1.463 #> 68 5.0e-03 -1.463 #> 69 5.0e-03 -1.463 #> 70 5.0e-03 -1.763 #> 71 5.0e-03 -1.763 #> 72 5.0e-03 -1.763 #> 73 5.0e-03 -1.763 #> 74 5.0e-03 -1.763 #> 75 4.9e-03 -1.513 #> 76 4.9e-03 -1.513 #> 77 4.9e-03 -1.513 #> 78 4.9e-03 -1.513 #> 79 4.9e-03 -1.513 #> 80 4.9e-03 -1.513 #> 81 4.9e-03 -1.513 #> 82 4.8e-03 1.138 #> 83 4.8e-03 1.138 #> 84 4.8e-03 1.138 #> 85 4.8e-03 1.138 #> 86 4.8e-03 1.138 #> 87 4.8e-03 -1.713 #> 88 4.8e-03 -1.713 #> 89 4.8e-03 -1.713 #> 90 4.8e-03 -1.713 #> 91 4.7e-03 -1.563 #> 92 4.7e-03 -1.563 #> 93 4.7e-03 -1.563 #> 94 4.6e-03 -1.663 #> 95 4.6e-03 -1.663 #> 96 4.6e-03 -1.663 #> 97 4.6e-03 -1.663 #> 98 4.6e-03 -1.613 #> 99 4.6e-03 -1.613 #> 100 4.6e-03 -1.613 #> 101 4.6e-03 -1.613 #> 102 4.6e-03 -1.613 #> 103 4.6e-03 -1.613 #> 104 3.8e-03 1.091 #> 105 3.8e-03 1.091 #> 106 3.8e-03 1.091 #> 107 3.0e-03 1.046 #> 108 3.0e-03 1.046 #> 109 3.0e-03 1.046 #> 110 2.6e-03 -0.614 #> 111 2.3e-03 1.002 #> 112 2.3e-03 1.002 #> 113 2.3e-03 1.002 #> 114 2.3e-03 1.002 #> 115 2.3e-03 1.002 #> 116 2.3e-03 1.002 #> 117 2.3e-03 1.002 #> 118 1.8e-03 0.959 #> 119 1.8e-03 0.959 #> 120 1.8e-03 0.959 #> 121 1.8e-03 0.959 #> 122 1.8e-03 0.959 #> 123 1.8e-03 0.959 #> 124 1.8e-03 0.959 #> 125 1.8e-03 0.959 #> 126 1.8e-03 0.959 #> 127 1.8e-03 0.959 #> 128 1.8e-03 0.959 #> 129 1.4e-03 0.917 #> 130 1.4e-03 0.917 #> 131 1.4e-03 0.917 #> 132 1.4e-03 0.917 #> 133 1.4e-03 0.917 #> 134 1.4e-03 0.917 #> 135 1.4e-03 0.917 #> 136 1.4e-03 0.917 #> 137 1.4e-03 0.917 #> 138 1.4e-03 0.917 #> 139 1.4e-03 0.917 #> 140 1.1e-03 0.876 #> 141 1.1e-03 0.876 #> 142 1.1e-03 0.876 #> 143 1.1e-03 0.876 #> 144 1.1e-03 0.876 #> 145 1.1e-03 0.876 #> 146 1.1e-03 0.876 #> 147 8.3e-04 0.836 #> 148 8.3e-04 0.836 #> 149 8.3e-04 0.836 #> 150 8.3e-04 0.836 #> 151 8.3e-04 0.836 #> 152 8.3e-04 0.836 #> 153 8.3e-04 0.836 #> 154 6.5e-04 0.797 #> 155 6.5e-04 0.797 #> 156 6.5e-04 0.797 #> 157 6.5e-04 0.797 #> 158 6.5e-04 0.797 #> 159 6.5e-04 0.797 #> 160 6.5e-04 0.797 #> 161 6.5e-04 0.797 #> 162 6.5e-04 0.797 #> 163 6.5e-04 0.797 #> 164 6.5e-04 0.797 #> 165 6.5e-04 0.797 #> 166 6.5e-04 0.797 #> 167 6.5e-04 0.797 #> 168 6.5e-04 0.797 #> 169 5.2e-04 0.760 #> 170 5.2e-04 0.760 #> 171 5.2e-04 0.760 #> 172 5.2e-04 0.760 #> 173 5.2e-04 0.760 #> 174 5.2e-04 0.760 #> 175 5.2e-04 0.760 #> 176 5.2e-04 0.760 #> 177 5.2e-04 0.760 #> 178 5.2e-04 0.760 #> 179 5.2e-04 0.760 #> 180 5.2e-04 0.760 #> 181 4.3e-04 0.724 #> 182 4.3e-04 0.724 #> 183 4.3e-04 0.724 #> 184 4.3e-04 0.724 #> 185 4.3e-04 0.724 #> 186 4.3e-04 0.724 #> 187 4.3e-04 0.724 #> 188 4.3e-04 0.724 #> 189 4.3e-04 0.724 #> 190 4.3e-04 0.724 #> 191 4.3e-04 0.724 #> 192 4.3e-04 0.724 #> 193 4.3e-04 0.724 #> 194 4.3e-04 0.724 #> 195 3.6e-04 0.689 #> 196 3.6e-04 0.689 #> 197 3.6e-04 0.689 #> 198 3.6e-04 0.689 #> 199 3.6e-04 0.689 #> 200 3.6e-04 0.689 #> 201 3.6e-04 0.689 #> 202 3.6e-04 0.689 #> 203 3.6e-04 0.689 #> 204 3.6e-04 0.689 #> 205 3.6e-04 0.689 #> 206 3.1e-04 0.656 #> 207 3.1e-04 0.656 #> 208 3.1e-04 0.656 #> 209 3.1e-04 0.656 #> 210 3.1e-04 0.656 #> 211 3.1e-04 0.656 #> 212 3.1e-04 0.656 #> 213 3.1e-04 0.656 #> 214 2.7e-04 0.624 #> 215 2.7e-04 0.624 #> 216 2.7e-04 0.624 #> 217 2.7e-04 0.624 #> 218 2.7e-04 0.624 #> 219 2.7e-04 0.624 #> 220 2.7e-04 0.624 #> 221 2.4e-04 0.593 #> 222 2.4e-04 0.593 #> 223 2.4e-04 0.593 #> 224 2.4e-04 0.593 #> 225 2.4e-04 0.593 #> 226 2.4e-04 0.593 #> 227 2.4e-04 0.593 #> 228 2.4e-04 0.593 #> 229 2.4e-04 0.593 #> 230 2.4e-04 0.593 #> 231 2.4e-04 0.593 #> 232 2.4e-04 0.593 #> 233 2.4e-04 0.593 #> 234 2.4e-04 0.593 #> 235 2.4e-04 0.593 #> 236 2.2e-04 0.563 #> 237 2.2e-04 0.563 #> 238 2.2e-04 0.563 #> 239 2.2e-04 0.563 #> 240 2.2e-04 0.563 #> 241 2.2e-04 0.563 #> 242 2.2e-04 0.563 #> 243 2.2e-04 0.563 #> 244 2.2e-04 0.563 #> 245 2.2e-04 0.563 #> 246 2.2e-04 0.563 #> 247 2.2e-04 0.563 #> 248 2.2e-04 0.563 #> 249 2.2e-04 0.563 #> 250 2.2e-04 0.563 #> 251 2.2e-04 0.563 #> 252 2.2e-04 0.563 #> 253 2.2e-04 0.563 #> 254 2.2e-04 0.563 #> 255 2.0e-04 0.535 #> 256 2.0e-04 0.535 #> 257 2.0e-04 0.535 #> 258 2.0e-04 0.535 #> 259 2.0e-04 0.535 #> 260 2.0e-04 0.535 #> 261 2.0e-04 0.535 #> 262 2.0e-04 0.535 #> 263 2.0e-04 0.535 #> 264 2.0e-04 0.535 #> 265 2.0e-04 0.535 #> 266 2.0e-04 0.535 #> 267 2.0e-04 0.535 #> 268 2.0e-04 0.535 #> 269 2.0e-04 0.535 #> 270 2.0e-04 0.535 #> 271 2.0e-04 0.535 #> 272 1.9e-04 0.508 #> 273 1.9e-04 0.508 #> 274 1.9e-04 0.508 #> 275 1.9e-04 0.508 #> 276 1.9e-04 0.508 #> 277 1.9e-04 0.508 #> 278 1.9e-04 0.508 #> 279 1.9e-04 0.508 #> 280 1.9e-04 0.508 #> 281 1.9e-04 0.508 #> 282 1.9e-04 0.508 #> 283 1.9e-04 0.508 #> 284 1.9e-04 0.508 #> 285 1.9e-04 0.508 #> 286 1.9e-04 0.508 #> 287 1.9e-04 0.508 #> 288 1.9e-04 0.508 #> 289 1.9e-04 0.508 #> 290 1.7e-04 0.482 #> 291 1.7e-04 0.482 #> 292 1.7e-04 0.482 #> 293 1.7e-04 0.482 #> 294 1.7e-04 0.482 #> 295 1.7e-04 0.482 #> 296 1.7e-04 0.482 #> 297 1.7e-04 0.482 #> 298 1.7e-04 0.482 #> 299 1.7e-04 0.482 #> 300 1.7e-04 0.482 #> 301 1.7e-04 0.482 #> 302 1.7e-04 0.482 #> 303 1.7e-04 0.482 #> 304 1.7e-04 0.482 #> 305 1.7e-04 0.482 #> 306 1.7e-04 0.482 #> 307 1.7e-04 0.482 #> 308 1.7e-04 0.482 #> 309 1.7e-04 0.482 #> 310 1.7e-04 0.482 #> 311 1.7e-04 0.482 #> 312 1.7e-04 0.482 #> 313 1.6e-04 0.457 #> 314 1.6e-04 0.457 #> 315 1.6e-04 0.457 #> 316 1.6e-04 0.457 #> 317 1.6e-04 0.457 #> 318 1.6e-04 0.457 #> 319 1.6e-04 0.457 #> 320 1.6e-04 0.457 #> 321 1.6e-04 0.457 #> 322 1.6e-04 0.457 #> 323 1.6e-04 0.457 #> 324 1.6e-04 0.457 #> 325 1.6e-04 0.457 #> 326 1.5e-04 0.434 #> 327 1.5e-04 0.434 #> 328 1.5e-04 0.434 #> 329 1.5e-04 0.434 #> 330 1.5e-04 0.434 #> 331 1.5e-04 0.434 #> 332 1.5e-04 0.434 #> 333 1.5e-04 0.434 #> 334 1.5e-04 0.434 #> 335 1.5e-04 0.434 #> 336 1.5e-04 0.434 #> 337 1.5e-04 0.434 #> 338 1.5e-04 0.434 #> 339 1.5e-04 0.434 #> 340 1.5e-04 0.434 #> 341 1.5e-04 0.434 #> 342 1.5e-04 0.434 #> 343 1.4e-04 0.411 #> 344 1.4e-04 0.411 #> 345 1.4e-04 0.411 #> 346 1.4e-04 0.411 #> 347 1.4e-04 0.411 #> 348 1.4e-04 0.411 #> 349 1.4e-04 0.411 #> 350 1.4e-04 0.411 #> 351 1.4e-04 0.411 #> 352 1.4e-04 0.411 #> 353 1.4e-04 0.411 #> 354 1.2e-04 0.390 #> 355 1.2e-04 0.390 #> 356 1.2e-04 0.390 #> 357 1.2e-04 0.390 #> 358 1.2e-04 0.390 #> 359 1.2e-04 0.390 #> 360 1.2e-04 0.390 #> 361 1.2e-04 0.390 #> 362 1.2e-04 0.390 #> 363 1.2e-04 0.390 #> 364 1.2e-04 0.390 #> 365 1.2e-04 0.390 #> 366 1.2e-04 0.390 #> 367 1.2e-04 0.390 #> 368 1.2e-04 0.390 #> 369 1.1e-04 0.370 #> 370 1.1e-04 0.370 #> 371 1.1e-04 0.370 #> 372 1.1e-04 0.370 #> 373 1.1e-04 0.370 #> 374 1.1e-04 0.370 #> 375 1.1e-04 0.370 #> 376 1.1e-04 0.370 #> 377 1.1e-04 0.370 #> 378 1.1e-04 0.370 #> 379 1.1e-04 0.370 #> 380 1.1e-04 0.370 #> 381 1.1e-04 0.370 #> 382 1.1e-04 0.370 #> 383 1.1e-04 0.370 #> 384 1.0e-04 0.350 #> 385 1.0e-04 0.350 #> 386 1.0e-04 0.350 #> 387 1.0e-04 0.350 #> 388 1.0e-04 0.350 #> 389 1.0e-04 0.350 #> 390 1.0e-04 0.350 #> 391 1.0e-04 0.350 #> 392 1.0e-04 0.350 #> 393 9.3e-05 0.332 #> 394 9.3e-05 0.332 #> 395 9.3e-05 0.332 #> 396 9.3e-05 0.332 #> 397 9.3e-05 0.332 #> 398 9.3e-05 0.332 #> 399 9.3e-05 0.332 #> 400 9.3e-05 0.332 #> 401 9.3e-05 0.332 #> 402 9.3e-05 0.332 #> 403 9.3e-05 0.332 #> 404 9.3e-05 0.332 #> 405 9.3e-05 0.332 #> 406 8.3e-05 0.314 #> 407 8.3e-05 0.314 #> 408 8.3e-05 0.314 #> 409 8.3e-05 0.314 #> 410 8.3e-05 0.314 #> 411 8.3e-05 0.314 #> 412 8.3e-05 0.314 #> 413 8.3e-05 0.314 #> 414 8.3e-05 0.314 #> 415 8.3e-05 0.314 #> 416 8.3e-05 0.314 #> 417 8.3e-05 0.314 #> 418 7.4e-05 0.298 #> 419 7.4e-05 0.298 #> 420 7.4e-05 0.298 #> 421 7.4e-05 0.298 #> 422 7.4e-05 0.298 #> 423 7.4e-05 0.298 #> 424 7.4e-05 0.298 #> 425 7.4e-05 0.298 #> 426 7.4e-05 0.298 #> 427 7.4e-05 0.298 #> 428 7.4e-05 0.298 #> 429 7.4e-05 0.298 #> 430 7.4e-05 0.298 #> 431 7.4e-05 0.298 #> 432 7.4e-05 0.298 #> 433 7.4e-05 0.298 #> 434 6.6e-05 0.282 #> 435 6.6e-05 0.282 #> 436 6.6e-05 0.282 #> 437 6.6e-05 0.282 #> 438 6.6e-05 0.282 #> 439 6.6e-05 0.282 #> 440 6.6e-05 0.282 #> 441 6.6e-05 0.282 #> 442 6.6e-05 0.282 #> 443 6.6e-05 0.282 #> 444 6.6e-05 0.282 #> 445 6.6e-05 0.282 #> 446 6.6e-05 0.282 #> 447 6.6e-05 0.282 #> 448 5.8e-05 0.267 #> 449 5.8e-05 0.267 #> 450 5.8e-05 0.267 #> 451 5.8e-05 0.267 #> 452 5.8e-05 0.267 #> 453 5.8e-05 0.267 #> 454 5.8e-05 0.267 #> 455 5.8e-05 0.267 #> 456 5.8e-05 0.267 #> 457 5.8e-05 0.267 #> 458 5.1e-05 0.253 #> 459 5.1e-05 0.253 #> 460 5.1e-05 0.253 #> 461 5.1e-05 0.253 #> 462 5.1e-05 0.253 #> 463 5.1e-05 0.253 #> 464 5.1e-05 0.253 #> 465 5.1e-05 0.253 #> 466 5.1e-05 0.253 #> 467 4.5e-05 0.239 #> 468 4.5e-05 0.239 #> 469 4.5e-05 0.239 #> 470 4.5e-05 0.239 #> 471 4.5e-05 0.239 #> 472 4.0e-05 0.227 #> 473 4.0e-05 0.227 #> 474 4.0e-05 0.227 #> 475 4.0e-05 0.227 #> 476 4.0e-05 0.227 #> 477 4.0e-05 0.227 #> 478 4.0e-05 0.227 #> 479 4.0e-05 0.227 #> 480 4.0e-05 0.227 #> 481 3.4e-05 0.214 #> 482 3.4e-05 0.214 #> 483 3.4e-05 0.214 #> 484 3.4e-05 0.214 #> 485 3.4e-05 0.214 #> 486 3.4e-05 0.214 #> 487 3.4e-05 0.214 #> 488 3.0e-05 0.203 #> 489 3.0e-05 0.203 #> 490 2.6e-05 0.192 #> 491 2.6e-05 0.192 #> 492 2.6e-05 0.192 #> 493 2.6e-05 0.192 #> 494 2.2e-05 0.182 #> 495 2.2e-05 0.182 #> 496 2.2e-05 0.182 #> 497 2.2e-05 0.182 #> 498 2.2e-05 0.182 #> 499 1.9e-05 0.172 #> 500 1.9e-05 0.172 #> 501 1.9e-05 0.172 #> 502 1.9e-05 0.172 #> 503 1.7e-05 0.163 #> 504 1.7e-05 0.163 #> 505 1.0e-05 0.138 #> 506 7.5e-06 0.124 #> 507 5.4e-06 0.111 #> 508 3.9e-06 0.099 #> 509 3.3e-06 0.094 #> 510 2.8e-06 0.089 #> 511 2.8e-06 0.089 #> 512 2.0e-06 0.079 ``` No influential observations. --- # Prediction from the model ```r newdata <- data.frame(Receiving.Points.Won=c(20, 50), won=c(NA, NA)) predict(tennis_glm, newdata, type="response") #> 1 2 #> 0.34 0.93 ``` Interpret the response as the probability of winning if your receiving points was 20, 50. --- # Summary Generalised linear models are a systematic way to fit different types of response distributions. --- # Resources - [Beginners guide](https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/) - [Introduction to GLMs](https://onlinecourses.science.psu.edu/stat504/node/216) - [Quick-R GLMs](http://www.statmethods.net/advstats/glm.html) - [The Analysis Factor, Generalized Linear Models Parts 1-4](http://www.theanalysisfactor.com/resources/by-topic/r/) - [wikipedia](https://en.wikipedia.org/wiki/Generalized_linear_model) - [Do Smashes Win Matches?](http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2013.00665.x/full) --- class: inverse middle # Share and share alike <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.