Statistical Thinking using Randomisation and Simulation

class: center, middle, inverse, title-slide

# Statistical Thinking using Randomisation and Simulation
## Linear models: diagnostics
### W5.C1

---

# Modeling Olympic medal counts

We fit the medal count for 2016, purely on the counts from 2012, to illustrate the influence diagnostics.

```
#>          term estimate std.error statistic p.value
#> 1 (Intercept)     0.72     0.526       1.4 1.7e-01
#> 2       M2012     0.94     0.026      36.5 4.5e-58
```

Giving the model,

`$M_{2016}=$` 0.72 `$+$` 0.94 `$M_{2012} + \varepsilon$`

---
class: inverse middle 
# Your turn

- Should the model be re-fit with the intercept forced to ZERO?

--
<div id="12d2f2b3e4f38" style="width:576px;height:432px;" class="plotly html-widget"></div>
<script type="application/json" data-for="12d2f2b3e4f38">{"x":{"data":[{"x":[104,65,88,82,44,38,34,28,28,35,20,17,17,17,11,12,6,14,13,18,4,13,8,4,12,2,4,9,8,6,20,4,10,6,3,3,4,7,10,12,5,3,10,7,4,2,9,1,0,2,1,0,0,0,0,0,2,2,1,2,7,1,5,5,2,1,6,5,0,1,0,0,2,4,2,3,0,0,2,2,3,1,0,1,4,0,2,1,1,1,1,1,1,2,1,1,1,1],"y":[121,67,70,55,42,41,42,21,28,29,19,15,19,17,13,11,10,11,18,22,13,17,8,7,8,6,4,15,11,10,11,8,11,7,6,6,4,7,18,9,8,4,10,8,4,3,4,2,2,3,2,2,2,1,1,1,1,1,1,5,5,2,2,4,3,3,2,2,1,1,1,1,1,4,3,3,2,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0],"text":["UnitedStates M2012: 104 M2016: 121","GreatBritain M2012: 65 M2016: 67","China M2012: 88 M2016: 70","RussianFed M2012: 82 M2016: 55","Germany M2012: 44 M2016: 42","Japan M2012: 38 M2016: 41","France M2012: 34 M2016: 42","SouthKorea M2012: 28 M2016: 21","Italy M2012: 28 M2016: 28","Australia M2012: 35 M2016: 29","Netherlands M2012: 20 M2016: 19","Hungary M2012: 17 M2016: 15","Brazil M2012: 17 M2016: 19","Spain M2012: 17 M2016: 17","Kenya M2012: 11 M2016: 13","Jamaica M2012: 12 M2016: 11","Croatia M2012: 6 M2016: 10","Cuba M2012: 14 M2016: 11","NewZealand M2012: 13 M2016: 18","Canada M2012: 18 M2016: 22","Uzbekistan M2012: 4 M2016: 13","Kazakhstan M2012: 13 M2016: 17","Colombia M2012: 8 M2016: 8","Switzerland M2012: 4 M2016: 7","Iran M2012: 12 M2016: 8","Greece M2012: 2 M2016: 6","Argentina M2012: 4 M2016: 4","Denmark M2012: 9 M2016: 15","Sweden M2012: 8 M2016: 11","SouthAfrica M2012: 6 M2016: 10","Ukraine M2012: 20 M2016: 11","Serbia M2012: 4 M2016: 8","Poland M2012: 10 M2016: 11","NorthKorea M2012: 6 M2016: 7","Belgium M2012: 3 M2016: 6","Thailand M2012: 3 M2016: 6","Slovakia M2012: 4 M2016: 4","Georgia M2012: 7 M2016: 7","Azerbaijan M2012: 10 M2016: 18","Belarus M2012: 12 M2016: 9","Turkey M2012: 5 M2016: 8","Armenia M2012: 3 M2016: 4","CzechRepublic M2012: 10 M2016: 10","Ethiopia M2012: 7 M2016: 8","Slovenia M2012: 4 M2016: 4","Indonesia M2012: 2 M2016: 3","Romania M2012: 9 M2016: 4","Bahrain M2012: 1 M2016: 2","Vietnam M2012: 0 M2016: 2","ChineseTaipei M2012: 2 M2016: 3","Bahamas M2012: 1 M2016: 2","IvoryCoast M2012: 0 M2016: 2","IndependentOlympicAthletes M2012: 0 M2016: 2","Fiji M2012: 0 M2016: 1","Jordan M2012: 0 M2016: 1","Kosovo M2012: 0 M2016: 1","PuertoRico M2012: 2 M2016: 1","Singapore M2012: 2 M2016: 1","Tajikistan M2012: 1 M2016: 1","Malaysia M2012: 2 M2016: 5","Mexico M2012: 7 M2016: 5","Algeria M2012: 1 M2016: 2","Ireland M2012: 5 M2016: 2","Lithuania M2012: 5 M2016: 4","Bulgaria M2012: 2 M2016: 3","Venezuela M2012: 1 M2016: 3","India M2012: 6 M2016: 2","Mongolia M2012: 5 M2016: 2","Burundi M2012: 0 M2016: 1","Grenada M2012: 1 M2016: 1","Niger M2012: 0 M2016: 1","Philippines M2012: 0 M2016: 1","Qatar M2012: 2 M2016: 1","Norway M2012: 4 M2016: 4","Egypt M2012: 2 M2016: 3","Tunisia M2012: 3 M2016: 3","Israel M2012: 0 M2016: 2","Austria M2012: 0 M2016: 1","DominicanRep. M2012: 2 M2016: 1","Estonia M2012: 2 M2016: 1","Finland M2012: 3 M2016: 1","Morocco M2012: 1 M2016: 1","Nigeria M2012: 0 M2016: 1","Portugal M2012: 1 M2016: 1","TrinidadTobago M2012: 4 M2016: 1","UnitedArabEmirates M2012: 0 M2016: 1","Latvia M2012: 2 M2016: 0","Uganda M2012: 1 M2016: 0","Botswana M2012: 1 M2016: 0","Cyprus M2012: 1 M2016: 0","Gabon M2012: 1 M2016: 0","Guatemala M2012: 1 M2016: 0","Montenegro M2012: 1 M2016: 0","Rep.ofMoldova M2012: 2 M2016: 0","Afghanistan M2012: 1 M2016: 0","HongKongChina M2012: 1 M2016: 0","SaudiArabia M2012: 1 M2016: 0","Kuwait M2012: 1 M2016: 0"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,0,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)"}},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[0,1.31645569620253,2.63291139240506,3.94936708860759,5.26582278481013,6.58227848101266,7.89873417721519,9.21518987341772,10.5316455696203,11.8481012658228,13.1645569620253,14.4810126582278,15.7974683544304,17.1139240506329,18.4303797468354,19.746835443038,21.0632911392405,22.379746835443,23.6962025316456,25.0126582278481,26.3291139240506,27.6455696202532,28.9620253164557,30.2784810126582,31.5949367088608,32.9113924050633,34.2278481012658,35.5443037974684,36.8607594936709,38.1772151898734,39.4936708860759,40.8101265822785,42.126582278481,43.4430379746835,44.7594936708861,46.0759493670886,47.3924050632911,48.7088607594937,50.0253164556962,51.3417721518987,52.6582278481013,53.9746835443038,55.2911392405063,56.6075949367089,57.9240506329114,59.2405063291139,60.5569620253165,61.873417721519,63.1898734177215,64.506329113924,65.8227848101266,67.1392405063291,68.4556962025316,69.7721518987342,71.0886075949367,72.4050632911392,73.7215189873418,75.0379746835443,76.3544303797468,77.6708860759494,78.9873417721519,80.3037974683544,81.620253164557,82.9367088607595,84.253164556962,85.5696202531645,86.8860759493671,88.2025316455696,89.5189873417721,90.8354430379747,92.1518987341772,93.4683544303797,94.7848101265823,96.1012658227848,97.4177215189873,98.7341772151899,100.050632911392,101.367088607595,102.683544303797,104],"y":[0.720935937437611,1.95302389930024,3.18511186116286,4.41719982302549,5.64928778488811,6.88137574675074,8.11346370861336,9.34555167047599,10.5776396323386,11.8097275942012,13.0418155560639,14.2739035179265,15.5059914797891,16.7380794416517,17.9701674035144,19.202255365377,20.4343433272396,21.6664312891022,22.8985192509649,24.1306072128275,25.3626951746901,26.5947831365527,27.8268710984154,29.058959060278,30.2910470221406,31.5231349840032,32.7552229458659,33.9873109077285,35.2193988695911,36.4514868314537,37.6835747933164,38.915662755179,40.1477507170416,41.3798386789042,42.6119266407669,43.8440146026295,45.0761025644921,46.3081905263547,47.5402784882174,48.77236645008,50.0044544119426,51.2365423738052,52.4686303356679,53.7007182975305,54.9328062593931,56.1648942212558,57.3969821831184,58.629070144981,59.8611581068436,61.0932460687062,62.3253340305689,63.5574219924315,64.7895099542941,66.0215979161568,67.2536858780194,68.485773839882,69.7178618017446,70.9499497636073,72.1820377254699,73.4141256873325,74.6462136491951,75.8783016110578,77.1103895729204,78.342477534783,79.5745654966456,80.8066534585083,82.0387414203709,83.2708293822335,84.5029173440961,85.7350053059588,86.9670932678214,88.199181229684,89.4312691915466,90.6633571534093,91.8954451152719,93.1275330771345,94.3596210389971,95.5917090008598,96.8237969627224,98.055884924585],"text":["M2012: 0.0 M2016: 0.72","M2012: 1.3 M2016: 1.95","M2012: 2.6 M2016: 3.19","M2012: 3.9 M2016: 4.42","M2012: 5.3 M2016: 5.65","M2012: 6.6 M2016: 6.88","M2012: 7.9 M2016: 8.11","M2012: 9.2 M2016: 9.35","M2012: 10.5 M2016: 10.58","M2012: 11.8 M2016: 11.81","M2012: 13.2 M2016: 13.04","M2012: 14.5 M2016: 14.27","M2012: 15.8 M2016: 15.51","M2012: 17.1 M2016: 16.74","M2012: 18.4 M2016: 17.97","M2012: 19.7 M2016: 19.20","M2012: 21.1 M2016: 20.43","M2012: 22.4 M2016: 21.67","M2012: 23.7 M2016: 22.90","M2012: 25.0 M2016: 24.13","M2012: 26.3 M2016: 25.36","M2012: 27.6 M2016: 26.59","M2012: 29.0 M2016: 27.83","M2012: 30.3 M2016: 29.06","M2012: 31.6 M2016: 30.29","M2012: 32.9 M2016: 31.52","M2012: 34.2 M2016: 32.76","M2012: 35.5 M2016: 33.99","M2012: 36.9 M2016: 35.22","M2012: 38.2 M2016: 36.45","M2012: 39.5 M2016: 37.68","M2012: 40.8 M2016: 38.92","M2012: 42.1 M2016: 40.15","M2012: 43.4 M2016: 41.38","M2012: 44.8 M2016: 42.61","M2012: 46.1 M2016: 43.84","M2012: 47.4 M2016: 45.08","M2012: 48.7 M2016: 46.31","M2012: 50.0 M2016: 47.54","M2012: 51.3 M2016: 48.77","M2012: 52.7 M2016: 50.00","M2012: 54.0 M2016: 51.24","M2012: 55.3 M2016: 52.47","M2012: 56.6 M2016: 53.70","M2012: 57.9 M2016: 54.93","M2012: 59.2 M2016: 56.16","M2012: 60.6 M2016: 57.40","M2012: 61.9 M2016: 58.63","M2012: 63.2 M2016: 59.86","M2012: 64.5 M2016: 61.09","M2012: 65.8 M2016: 62.33","M2012: 67.1 M2016: 63.56","M2012: 68.5 M2016: 64.79","M2012: 69.8 M2016: 66.02","M2012: 71.1 M2016: 67.25","M2012: 72.4 M2016: 68.49","M2012: 73.7 M2016: 69.72","M2012: 75.0 M2016: 70.95","M2012: 76.4 M2016: 72.18","M2012: 77.7 M2016: 73.41","M2012: 79.0 M2016: 74.65","M2012: 80.3 M2016: 75.88","M2012: 81.6 M2016: 77.11","M2012: 82.9 M2016: 78.34","M2012: 84.3 M2016: 79.57","M2012: 85.6 M2016: 80.81","M2012: 86.9 M2016: 82.04","M2012: 88.2 M2016: 83.27","M2012: 89.5 M2016: 84.50","M2012: 90.8 M2016: 85.74","M2012: 92.2 M2016: 86.97","M2012: 93.5 M2016: 88.20","M2012: 94.8 M2016: 89.43","M2012: 96.1 M2016: 90.66","M2012: 97.4 M2016: 91.90","M2012: 98.7 M2016: 93.13","M2012: 100.1 M2016: 94.36","M2012: 101.4 M2016: 95.59","M2012: 102.7 M2016: 96.82","M2012: 104.0 M2016: 98.06"],"type":"scatter","mode":"lines","name":"fitted values","line":{"width":3.77952755905512,"color":"rgba(51,102,255,1)","dash":"solid"},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-2.8,2.22068965517242,7.24137931034483,12.2620689655172,17.2827586206896,22.3034482758621,27.3241379310345,32.3448275862069,37.3655172413793,42.3862068965518,47.4068965517242,52.4275862068966,57.448275862069,62.4689655172414,67.4896551724138,72.5103448275862,77.5310344827586,82.551724137931,87.5724137931035,92.5931034482759,97.6137931034483,102.634482758621,107.655172413793,112.675862068966,117.696551724138,122.71724137931,127.737931034483,132.758620689655,137.779310344828,142.8],"y":[-2.8,-2.28721956798971,-1.95132405166517,-1.75663506667812,-1.67388480570406,-1.6781911195986,-1.74788833551478,-1.86392612109651,-2.00961013300074,-2.17051367635635,-2.33443914934132,-2.49135165385083,-2.63324481923873,-2.75393061134194,-2.84877068344291,-2.91438666948855,-2.94840072276553,-2.94926556533115,-2.91624533481641,-2.84960459575332,-2.75105302233181,-2.62447745846286,-2.47697131921421,-2.32014361509184,-2.17165625726626,-2.05689873768561,-2.01066377307925,-2.07863605613451,-2.31844887062703,-2.8],"text":["x: -2.8 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: -2.8 M2016: -2.8","x: 2.2 y: -2.3 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 2.2 M2016: -2.3","x: 7.2 y: -2.0 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 7.2 M2016: -2.0","x: 12.3 y: -1.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 12.3 M2016: -1.8","x: 17.3 y: -1.7 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 17.3 M2016: -1.7","x: 22.3 y: -1.7 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 22.3 M2016: -1.7","x: 27.3 y: -1.7 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 27.3 M2016: -1.7","x: 32.3 y: -1.9 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 32.3 M2016: -1.9","x: 37.4 y: -2.0 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 37.4 M2016: -2.0","x: 42.4 y: -2.2 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 42.4 M2016: -2.2","x: 47.4 y: -2.3 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 47.4 M2016: -2.3","x: 52.4 y: -2.5 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 52.4 M2016: -2.5","x: 57.4 y: -2.6 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 57.4 M2016: -2.6","x: 62.5 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 62.5 M2016: -2.8","x: 67.5 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 67.5 M2016: -2.8","x: 72.5 y: -2.9 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 72.5 M2016: -2.9","x: 77.5 y: -2.9 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 77.5 M2016: -2.9","x: 82.6 y: -2.9 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 82.6 M2016: -2.9","x: 87.6 y: -2.9 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 87.6 M2016: -2.9","x: 92.6 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 92.6 M2016: -2.8","x: 97.6 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 97.6 M2016: -2.8","x: 102.6 y: -2.6 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 102.6 M2016: -2.6","x: 107.7 y: -2.5 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 107.7 M2016: -2.5","x: 112.7 y: -2.3 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 112.7 M2016: -2.3","x: 117.7 y: -2.2 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 117.7 M2016: -2.2","x: 122.7 y: -2.1 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 122.7 M2016: -2.1","x: 127.7 y: -2.0 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 127.7 M2016: -2.0","x: 132.8 y: -2.1 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 132.8 M2016: -2.1","x: 137.8 y: -2.3 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 137.8 M2016: -2.3","x: 142.8 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: 143 yend: -2.8 M2012: 142.8 M2016: -2.8"],"type":"scatter","mode":"lines","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)","dash":"solid"},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-2.8,-2.19117481268327,-1.80614276355618,-1.59029557898991,-1.49273188292982,-1.46936011783729,-1.48457893596132,-1.51187443305058,-1.53361575917904,-1.54027920393317,-1.52928581579626,-1.50359897816522,-1.47019612704761,-1.43850295811127,-1.41885903339633,-1.42107066064889,-1.45310028089786,-1.51994136257099,-1.62273396313303,-1.75818868192863,-1.91840569062427,-2.09120089036798,-2.26108300752234,-2.41106360257564,-2.52552652959747,-2.59443434637957,-2.61920653818895,-2.62066818085951,-2.64953783176004,-2.8],"y":[-2.8,2.22068965517242,7.24137931034483,12.2620689655172,17.2827586206896,22.3034482758621,27.3241379310345,32.3448275862069,37.3655172413793,42.3862068965518,47.4068965517242,52.4275862068966,57.448275862069,62.4689655172414,67.4896551724138,72.5103448275862,77.5310344827586,82.551724137931,87.5724137931035,92.5931034482759,97.6137931034483,102.634482758621,107.655172413793,112.675862068966,117.696551724138,122.71724137931,127.737931034483,132.758620689655,137.779310344828,142.8],"text":["x: -2.8 y: -2.8 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.8 M2016: -2.8","x: -2.2 y: 2.2 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.2 M2016: 2.2","x: -1.8 y: 7.2 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.8 M2016: 7.2","x: -1.6 y: 12.3 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.6 M2016: 12.3","x: -1.5 y: 17.3 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 17.3","x: -1.5 y: 22.3 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 22.3","x: -1.5 y: 27.3 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 27.3","x: -1.5 y: 32.3 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 32.3","x: -1.5 y: 37.4 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 37.4","x: -1.5 y: 42.4 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 42.4","x: -1.5 y: 47.4 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 47.4","x: -1.5 y: 52.4 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 52.4","x: -1.5 y: 57.4 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 57.4","x: -1.4 y: 62.5 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.4 M2016: 62.5","x: -1.4 y: 67.5 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.4 M2016: 67.5","x: -1.4 y: 72.5 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.4 M2016: 72.5","x: -1.5 y: 77.5 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 77.5","x: -1.5 y: 82.6 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.5 M2016: 82.6","x: -1.6 y: 87.6 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.6 M2016: 87.6","x: -1.8 y: 92.6 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.8 M2016: 92.6","x: -1.9 y: 97.6 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -1.9 M2016: 97.6","x: -2.1 y: 102.6 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.1 M2016: 102.6","x: -2.3 y: 107.7 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.3 M2016: 107.7","x: -2.4 y: 112.7 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.4 M2016: 112.7","x: -2.5 y: 117.7 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.5 M2016: 117.7","x: -2.6 y: 122.7 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.6 M2016: 122.7","x: -2.6 y: 127.7 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.6 M2016: 127.7","x: -2.6 y: 132.8 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.6 M2016: 132.8","x: -2.6 y: 137.8 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.6 M2016: 137.8","x: -2.8 y: 142.8 xbegin: -2.8 ybegin: -2.8 xend: -2.8 yend: 143 M2012: -2.8 M2016: 142.8"],"type":"scatter","mode":"lines","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)","dash":"solid"},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null}],"layout":{"margin":{"t":25.7412480974125,"r":7.30593607305936,"b":51.6507541165075,"l":57.7168949771689},"paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"xkcd","size":21.2536322125363},"xaxis":{"domain":[0,1],"type":"linear","autorange":false,"tickmode":"array","range":[-11.62,151.62],"ticktext":["0","50","100","150"],"tickvals":[0,50,100,150],"ticks":"outside","tickcolor":"rgba(0,0,0,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"xkcd","size":17.0029057700291},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":false,"gridcolor":null,"gridwidth":0,"zeroline":false,"anchor":"y","title":"M2012","titlefont":{"color":"rgba(0,0,0,1)","family":"xkcd","size":21.2536322125363},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"type":"linear","autorange":false,"tickmode":"array","range":[-11.62,151.62],"ticktext":["0","50","100","150"],"tickvals":[0,50,100,150],"ticks":"outside","tickcolor":"rgba(0,0,0,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"xkcd","size":17.0029057700291},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":false,"gridcolor":null,"gridwidth":0,"zeroline":false,"anchor":"x","title":"M2016","titlefont":{"color":"rgba(0,0,0,1)","family":"xkcd","size":21.2536322125363},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":false,"legend":{"bgcolor":"rgba(255,255,255,1)","bordercolor":"transparent","borderwidth":1.88976377952756,"font":{"color":"rgba(0,0,0,1)","family":"xkcd","size":17.0029057700291}},"hovermode":"closest"},"source":"A","attrs":{"12d2f7d662087":{"text":{},"x":{},"y":{},"type":"ggplotly"},"12d2f3be1bf62":{"x":{},"y":{}},"12d2f61dbd9a4":{"x":{},"y":{},"xbegin":{},"ybegin":{},"xend":{},"yend":{},"x.1":{},"y.1":{}},"12d2f28add02d":{"x":{},"y":{},"xbegin":{},"ybegin":{},"xend":{},"yend":{},"x.1":{},"y.1":{}}},"cur_data":"12d2f7d662087","visdat":{"12d2f7d662087":["function (y) ","x"],"12d2f3be1bf62":["function (y) ","x"],"12d2f61dbd9a4":["function (y) ","x"],"12d2f28add02d":["function (y) ","x"]},"config":{"modeBarButtonsToAdd":[{"name":"Collaborate","icon":{"width":1000,"ascent":500,"descent":-50,"path":"M487 375c7-10 9-23 5-36l-79-259c-3-12-11-23-22-31-11-8-22-12-35-12l-263 0c-15 0-29 5-43 15-13 10-23 23-28 37-5 13-5 25-1 37 0 0 0 3 1 7 1 5 1 8 1 11 0 2 0 4-1 6 0 3-1 5-1 6 1 2 2 4 3 6 1 2 2 4 4 6 2 3 4 5 5 7 5 7 9 16 13 26 4 10 7 19 9 26 0 2 0 5 0 9-1 4-1 6 0 8 0 2 2 5 4 8 3 3 5 5 5 7 4 6 8 15 12 26 4 11 7 19 7 26 1 1 0 4 0 9-1 4-1 7 0 8 1 2 3 5 6 8 4 4 6 6 6 7 4 5 8 13 13 24 4 11 7 20 7 28 1 1 0 4 0 7-1 3-1 6-1 7 0 2 1 4 3 6 1 1 3 4 5 6 2 3 3 5 5 6 1 2 3 5 4 9 2 3 3 7 5 10 1 3 2 6 4 10 2 4 4 7 6 9 2 3 4 5 7 7 3 2 7 3 11 3 3 0 8 0 13-1l0-1c7 2 12 2 14 2l218 0c14 0 25-5 32-16 8-10 10-23 6-37l-79-259c-7-22-13-37-20-43-7-7-19-10-37-10l-248 0c-5 0-9-2-11-5-2-3-2-7 0-12 4-13 18-20 41-20l264 0c5 0 10 2 16 5 5 3 8 6 10 11l85 282c2 5 2 10 2 17 7-3 13-7 17-13z m-304 0c-1-3-1-5 0-7 1-1 3-2 6-2l174 0c2 0 4 1 7 2 2 2 4 4 5 7l6 18c0 3 0 5-1 7-1 1-3 2-6 2l-173 0c-3 0-5-1-8-2-2-2-4-4-4-7z m-24-73c-1-3-1-5 0-7 2-2 3-2 6-2l174 0c2 0 5 0 7 2 3 2 4 4 5 7l6 18c1 2 0 5-1 6-1 2-3 3-5 3l-174 0c-3 0-5-1-7-3-3-1-4-4-5-6z"},"click":"function(gd) { \n // is this being viewed in RStudio?\n if (location.search == '?viewer_pane=1') {\n alert('To learn about plotly for collaboration, visit:\\n https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html');\n } else {\n window.open('https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html', '_blank');\n }\n }"}],"cloud":false},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1}},"base_url":"https://plot.ly"},"evals":["config.modeBarButtonsToAdd.0.click"],"jsHooks":{"render":[{"code":"function(el, x) { var ctConfig = crosstalk.var('plotlyCrosstalkOpts').set({\"on\":\"plotly_click\",\"persistent\":false,\"dynamic\":false,\"selectize\":false,\"opacityDim\":0.2,\"selected\":{\"opacity\":1}}); }","data":null}]}}</script>

---
# Model diagnostics

- Based on `leave-one-out` statistics
- For `$n$` observations, fit `$n$` models where each model has one observation removed. 
- Let's take a look at fitting the medal tallies, without the USA.

```
#>    all noUSA  estimate
#> 1 0.72  1.33 intercept
#> 2 0.94  0.84     slope
```

--
- Parameter estimates change a little

---

---
# Other model fit parameters

- deviance
- predicted values, residuals

|       | null.dev| deviance| fitted| resid|
|:------|--------:|--------:|------:|-----:|
|All    |    29766|     2002|     98|   5.9|
|No USA |    17298|     1260|     89|  15.3|

---
# What it could look like

---
# Leverage

Leverage `$h_{ii}$` is defined for each observation, `$1, ..., n$`, and is the `$i^{th}$` diagonal element of the hat matrix:

`$$H=X(X^TX)^{-1}X^T$$`

where `$X$` is the design matrix, e.g. for `$\beta_0+\beta_1x$`,

`$$X=\left[ \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right]$$`

Intuitively, observations which are far from the mean of the explanatory variables will have higher leverage.

YOU CAN CALCULATE THIS WITHOUT FITTING ALL `$n$` MODELS!

---
# Highest leverage for medal tally model

```
#>                       Country  .hat
#> 1                UnitedStates 0.290
#> 2                       China 0.203
#> 3                  RussianFed 0.175
#> 4                GreatBritain 0.106
#> 5                     Germany 0.047
#> 6                       Japan 0.035
#> 7                   Australia 0.030
#> 8                      France 0.029
#> 9                  SouthKorea 0.021
#> 10                      Italy 0.021
#> 11                Netherlands 0.013
#> 12                    Ukraine 0.013
#> 13                    Vietnam 0.013
#> 14                 IvoryCoast 0.013
#> 15 IndependentOlympicAthletes 0.013
```

Cutoff for high leverage is `$2p/n = 2*1/73 = 0.027$`.

---
# Plot of leverage

---
# Log-tranform the counts

---

```
#>                       Country  .hat
#> 1                UnitedStates 0.081
#> 2                       China 0.074
#> 3                  RussianFed 0.070
#> 4                GreatBritain 0.061
#> 5                     Germany 0.047
#> 6                       Japan 0.042
#> 7                   Australia 0.040
#> 8                      France 0.039
#> 9                  SouthKorea 0.034
#> 10                      Italy 0.034
#> 11                    Vietnam 0.030
#> 12                 IvoryCoast 0.030
#> 13 IndependentOlympicAthletes 0.030
#> 14                       Fiji 0.030
#> 15                     Jordan 0.030
```

---

Transforming skewed variables reduces the influence of any one, or few points. The distribution is more even, and the highest leverage value is much lower now.

---
# Hat values for simulated data

---
# Cooks D

Leverage takes no notice of the response variable. So the USA did not have a huge influence because its medal count in 2012 was similar to that in 2008, so it was close to the trend. If for some reason the medal count in 2012 was 0, the line with the USA would be much more drawn away from the other countries.

Cooks D, and DFFITS, also use the response variable, to assess influence.

`$$D_i = \frac{e_i^2}{{MSE}p}\frac{h_{ii}}{(1-h_{ii})^2}$$`

where `$e_i$` is the `$i^{th}$` residual, `$p=$`number of explanatory variables, and MSE is the mean squared error of the linear model.

Values greater than `$4/n$` are large, by a rule of thumb. Or alternatively, greater than 1 is another rule of thumb.

---
# Cooks D for Olympic medal tally

`$$~~~~~~~~~ Raw ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Transformed$$`

```
#>                       Country .cooksd                    Country .cooksd
#> 1                UnitedStates 7.3e+00                    Vietnam 4.6e-02
#> 2                  RussianFed 3.1e+00                 IvoryCoast 4.6e-02
#> 3                       China 1.3e+00 IndependentOlympicAthletes 4.6e-02
#> 4                GreatBritain 9.5e-02                     Israel 4.6e-02
#> 5                      France 6.5e-02               UnitedStates 3.3e-02
#> 6                     Ukraine 2.4e-02                     Latvia 3.3e-02
#> 7                       Japan 2.0e-02              Rep.ofMoldova 3.3e-02
#> 8                  Uzbekistan 2.0e-02                     Uganda 2.2e-02
#> 9                  SouthKorea 1.8e-02                   Botswana 2.2e-02
#> 10                 Azerbaijan 1.6e-02                     Cyprus 2.2e-02
#> 11                  Australia 1.5e-02                      Gabon 2.2e-02
#> 12                    Denmark 8.6e-03                  Guatemala 2.2e-02
#> 13                 NewZealand 6.7e-03                 Montenegro 2.2e-02
#> 14                    Romania 6.6e-03                Afghanistan 2.2e-02
#> 15                     Canada 5.9e-03              HongKongChina 2.2e-02
#> 16                      India 4.9e-03                SaudiArabia 2.2e-02
#> 17                 Kazakhstan 4.4e-03                     Kuwait 2.2e-02
#> 18                       Iran 4.0e-03                 Uzbekistan 1.8e-02
#> 19                    Croatia 3.5e-03             TrinidadTobago 1.8e-02
#> 20                SouthAfrica 3.5e-03                      India 1.5e-02
#> 21                     Greece 3.5e-03                       Fiji 1.3e-02
#> 22                     Serbia 3.5e-03                     Jordan 1.3e-02
#> 23             TrinidadTobago 3.3e-03                     Kosovo 1.3e-02
#> 24                    Ireland 3.1e-03                    Burundi 1.3e-02
#> 25                   Mongolia 3.1e-03                      Niger 1.3e-02
#> 26                    Belarus 2.2e-03                Philippines 1.3e-02
#> 27                       Cuba 2.1e-03                    Austria 1.3e-02
#> 28                     Latvia 2.0e-03                    Nigeria 1.3e-02
#> 29              Rep.ofMoldova 2.0e-03         UnitedArabEmirates 1.3e-02
#> 30                     Sweden 2.0e-03                     Greece 1.3e-02
#> 31                    Finland 1.8e-03                    Finland 1.1e-02
#> 32                     Turkey 1.8e-03                    Romania 1.1e-02
#> 33                Switzerland 1.8e-03                     France 1.1e-02
#> 34                    Belgium 1.7e-03                    Ireland 1.0e-02
#> 35                   Thailand 1.7e-03                   Mongolia 1.0e-02
#> 36                   Malaysia 1.7e-03                    Ukraine 9.9e-03
#> 37                     Brazil 1.6e-03                 Azerbaijan 9.7e-03
#> 38                     Mexico 1.3e-03                  Venezuela 9.6e-03
#> 39                      Kenya 9.9e-04               GreatBritain 9.0e-03
#> 40                     Uganda 8.5e-04                   Malaysia 8.1e-03
#> 41                   Botswana 8.5e-04                    Denmark 6.5e-03
#> 42                     Cyprus 8.5e-04                 PuertoRico 5.9e-03
#> 43                      Gabon 8.5e-04                  Singapore 5.9e-03
#> 44                  Guatemala 8.5e-04                      Qatar 5.9e-03
#> 45                 Montenegro 8.5e-04              DominicanRep. 5.9e-03
#> 46                Afghanistan 8.5e-04                    Estonia 5.9e-03
#> 47              HongKongChina 8.5e-04                     Serbia 5.6e-03
#> 48                SaudiArabia 8.5e-04                      Japan 5.4e-03
#> 49                     Kuwait 8.5e-04                 NewZealand 4.8e-03
#> 50                    Hungary 7.7e-04                    Belgium 4.8e-03
#> 51                 PuertoRico 7.6e-04                   Thailand 4.8e-03
#> 52                  Singapore 7.6e-04                    Croatia 4.1e-03
#> 53                      Qatar 7.6e-04                SouthAfrica 4.1e-03
#> 54              DominicanRep. 7.6e-04                     Canada 3.9e-03
#> 55                    Estonia 7.6e-04                 Kazakhstan 3.5e-03
#> 56                      Italy 5.9e-04                Switzerland 3.5e-03
#> 57                  Venezuela 5.6e-04                       Iran 3.0e-03
#> 58                    Vietnam 5.3e-04                     Turkey 2.8e-03
#> 59                 IvoryCoast 5.3e-04                 RussianFed 2.6e-03
#> 60 IndependentOlympicAthletes 5.3e-04                     Sweden 2.3e-03
#> 61                     Israel 5.3e-04                    Bahrain 2.2e-03
#> 62                  Lithuania 5.3e-04                    Bahamas 2.2e-03
#> 63                    Jamaica 2.3e-04                    Algeria 2.2e-03
#> 64                     Poland 2.1e-04                    Germany 2.1e-03
#> 65                   Ethiopia 1.4e-04                     Brazil 1.7e-03
#> 66                 Tajikistan 1.3e-04                     Mexico 1.7e-03
#> 67                    Grenada 1.3e-04                      Italy 1.4e-03
#> 68                    Morocco 1.3e-04                    Belarus 1.3e-03
#> 69                   Portugal 1.3e-04                      Kenya 1.3e-03
#> 70                 NorthKorea 1.2e-04                 SouthKorea 1.1e-03
#> 71                    Tunisia 8.0e-05                  Lithuania 8.2e-04
#> 72                Netherlands 6.4e-05                  Indonesia 8.2e-04
#> 73                    Armenia 6.4e-05              ChineseTaipei 8.2e-04
#> 74                  Argentina 6.0e-05                   Bulgaria 8.2e-04
#> 75                   Slovakia 6.0e-05                      Egypt 8.2e-04
#> 76                   Slovenia 6.0e-05                       Cuba 7.9e-04
#> 77                     Norway 6.0e-05                 Tajikistan 6.5e-04
#> 78                  Indonesia 4.9e-05                    Grenada 6.5e-04
#> 79              ChineseTaipei 4.9e-05                    Morocco 6.5e-04
#> 80                   Bulgaria 4.9e-05                   Portugal 6.5e-04
#> 81                      Egypt 4.9e-05                    Armenia 4.8e-04
#> 82                      Spain 3.9e-05                     Poland 4.7e-04
#> 83                    Bahrain 3.7e-05                      Spain 3.9e-04
#> 84                    Bahamas 3.7e-05                   Ethiopia 3.5e-04
#> 85                    Algeria 3.7e-05                 NorthKorea 3.3e-04
#> 86                       Fiji 2.5e-05                      China 2.9e-04
#> 87                     Jordan 2.5e-05                Netherlands 2.0e-04
#> 88                     Kosovo 2.5e-05                    Tunisia 9.1e-05
#> 89                    Burundi 2.5e-05              CzechRepublic 4.9e-05
#> 90                      Niger 2.5e-05                  Argentina 3.6e-05
#> 91                Philippines 2.5e-05                   Slovakia 3.6e-05
#> 92                    Austria 2.5e-05                   Slovenia 3.6e-05
#> 93                    Nigeria 2.5e-05                     Norway 3.6e-05
#> 94         UnitedArabEmirates 2.5e-05                    Hungary 2.1e-05
#> 95                    Georgia 1.9e-05                    Jamaica 1.3e-05
#> 96                    Germany 1.2e-05                  Australia 1.0e-05
#> 97                   Colombia 1.1e-05                   Colombia 9.8e-06
#> 98              CzechRepublic 1.6e-06                    Georgia 1.2e-06
```

---

---
# Cooks D for simulated data

Values are more spread, when the one extreme value is removed. No other points are influential.

---
# Solutions

- Remove influential observations, and re-fit model
- Transform explanatory variables to reduce influence
- Use weighted regression to downweight influence of extreme observations

---
class: inverse middle 
# Your turn

- What happens when there are two extreme points with virtually the same values?

---
# Collinearity

Population and GDP are standardised.

```
#>             term estimate std.error statistic p.value
#> 1    (Intercept)   1.8604   0.49070       3.8 2.9e-04
#> 2     Total_2012   0.7471   0.04108      18.2 1.6e-30
#> 3 Population_mil  -0.0260   0.00384      -6.8 1.7e-09
#> 4    GDP_PPP_bil   0.0024   0.00038       6.4 8.4e-09
```

Giving the model `$M2016=$` 1.86 `$+$` 0.75 `$M2012+$` -0.03 `$Pop+$` 0 `$GDP+\varepsilon$`

---
# Plot the explanatory variables

---
# Explore countries

<div id="12d2f46e4740c" style="width:504px;height:396px;" class="plotly html-widget"></div>
<script type="application/json" data-for="12d2f46e4740c">{"x":{"data":[{"x":[321.4,64.1,1367.5,142.4,80.9,126.9,66.6,49.1,61.9,22.8,16.9,9.9,204.3,48.1,45.9,3,4.5,11,4.4,35.1,29.2,18.2,46.7,8.1,81.8,10.8,43.4,5.6,9.8,53.7,44.4,7.2,38.6,25,11.3,68,5.4,4.9,9.8,9.6,79.4,3.1,10.6,99.5,2,256,21.7,1.3,94.3,23.4,0.3,23.3,null,0.9,8.1,1.9,3.6,5.7,8.2,30.5,121.7,39.5,4.9,2.9,7.2,29.3,1251.7,3,10.7,0.1,18,101,2.2,5.2,88.5,11,8,8.7,10.5,1.3,5.5,33.3,3.5,181.6,10.8,1.2,5.8,null,null,null,null,null,null,null,null,null,null,null,null],"y":[104,65,88,82,44,38,34,28,28,35,20,17,17,17,11,12,6,14,13,18,4,13,8,4,12,2,4,9,8,6,20,4,10,6,3,3,4,7,10,12,5,3,10,7,4,2,9,1,0,2,1,0,0,0,0,0,2,2,1,2,7,1,5,5,2,1,6,5,0,1,0,0,2,4,2,3,0,0,2,2,3,1,0,0,1,4,0,2,1,1,1,1,1,1,2,1,1,1,1],"text":["Population_mil: 321.4 Total_2012: 104 Country: UnitedStates","Population_mil: 64.1 Total_2012: 65 Country: GreatBritain","Population_mil: 1367.5 Total_2012: 88 Country: China","Population_mil: 142.4 Total_2012: 82 Country: RussianFed","Population_mil: 80.9 Total_2012: 44 Country: Germany","Population_mil: 126.9 Total_2012: 38 Country: Japan","Population_mil: 66.6 Total_2012: 34 Country: France","Population_mil: 49.1 Total_2012: 28 Country: SouthKorea","Population_mil: 61.9 Total_2012: 28 Country: Italy","Population_mil: 22.8 Total_2012: 35 Country: Australia","Population_mil: 16.9 Total_2012: 20 Country: Netherlands","Population_mil: 9.9 Total_2012: 17 Country: Hungary","Population_mil: 204.3 Total_2012: 17 Country: Brazil","Population_mil: 48.1 Total_2012: 17 Country: Spain","Population_mil: 45.9 Total_2012: 11 Country: Kenya","Population_mil: 3.0 Total_2012: 12 Country: Jamaica","Population_mil: 4.5 Total_2012: 6 Country: Croatia","Population_mil: 11.0 Total_2012: 14 Country: Cuba","Population_mil: 4.4 Total_2012: 13 Country: NewZealand","Population_mil: 35.1 Total_2012: 18 Country: Canada","Population_mil: 29.2 Total_2012: 4 Country: Uzbekistan","Population_mil: 18.2 Total_2012: 13 Country: Kazakhstan","Population_mil: 46.7 Total_2012: 8 Country: Colombia","Population_mil: 8.1 Total_2012: 4 Country: Switzerland","Population_mil: 81.8 Total_2012: 12 Country: Iran","Population_mil: 10.8 Total_2012: 2 Country: Greece","Population_mil: 43.4 Total_2012: 4 Country: Argentina","Population_mil: 5.6 Total_2012: 9 Country: Denmark","Population_mil: 9.8 Total_2012: 8 Country: Sweden","Population_mil: 53.7 Total_2012: 6 Country: SouthAfrica","Population_mil: 44.4 Total_2012: 20 Country: Ukraine","Population_mil: 7.2 Total_2012: 4 Country: Serbia","Population_mil: 38.6 Total_2012: 10 Country: Poland","Population_mil: 25.0 Total_2012: 6 Country: NorthKorea","Population_mil: 11.3 Total_2012: 3 Country: Belgium","Population_mil: 68.0 Total_2012: 3 Country: Thailand","Population_mil: 5.4 Total_2012: 4 Country: Slovakia","Population_mil: 4.9 Total_2012: 7 Country: Georgia","Population_mil: 9.8 Total_2012: 10 Country: Azerbaijan","Population_mil: 9.6 Total_2012: 12 Country: Belarus","Population_mil: 79.4 Total_2012: 5 Country: Turkey","Population_mil: 3.1 Total_2012: 3 Country: Armenia","Population_mil: 10.6 Total_2012: 10 Country: CzechRepublic","Population_mil: 99.5 Total_2012: 7 Country: Ethiopia","Population_mil: 2.0 Total_2012: 4 Country: Slovenia","Population_mil: 256.0 Total_2012: 2 Country: Indonesia","Population_mil: 21.7 Total_2012: 9 Country: Romania","Population_mil: 1.3 Total_2012: 1 Country: Bahrain","Population_mil: 94.3 Total_2012: 0 Country: Vietnam","Population_mil: 23.4 Total_2012: 2 Country: ChineseTaipei","Population_mil: 0.3 Total_2012: 1 Country: Bahamas","Population_mil: 23.3 Total_2012: 0 Country: IvoryCoast","Population_mil: NA Total_2012: 0 Country: Independent","Population_mil: 0.9 Total_2012: 0 Country: Fiji","Population_mil: 8.1 Total_2012: 0 Country: Jordan","Population_mil: 1.9 Total_2012: 0 Country: Kosovo","Population_mil: 3.6 Total_2012: 2 Country: PuertoRico","Population_mil: 5.7 Total_2012: 2 Country: Singapore","Population_mil: 8.2 Total_2012: 1 Country: Tajikistan","Population_mil: 30.5 Total_2012: 2 Country: Malaysia","Population_mil: 121.7 Total_2012: 7 Country: Mexico","Population_mil: 39.5 Total_2012: 1 Country: Algeria","Population_mil: 4.9 Total_2012: 5 Country: Ireland","Population_mil: 2.9 Total_2012: 5 Country: Lithuania","Population_mil: 7.2 Total_2012: 2 Country: Bulgaria","Population_mil: 29.3 Total_2012: 1 Country: Venezuela","Population_mil: 1251.7 Total_2012: 6 Country: India","Population_mil: 3.0 Total_2012: 5 Country: Mongolia","Population_mil: 10.7 Total_2012: 0 Country: Burundi","Population_mil: 0.1 Total_2012: 1 Country: Grenada","Population_mil: 18.0 Total_2012: 0 Country: Niger","Population_mil: 101.0 Total_2012: 0 Country: Philippines","Population_mil: 2.2 Total_2012: 2 Country: Qatar","Population_mil: 5.2 Total_2012: 4 Country: Norway","Population_mil: 88.5 Total_2012: 2 Country: Egypt","Population_mil: 11.0 Total_2012: 3 Country: Tunisia","Population_mil: 8.0 Total_2012: 0 Country: Israel","Population_mil: 8.7 Total_2012: 0 Country: Austria","Population_mil: 10.5 Total_2012: 2 Country: DominicanRep.","Population_mil: 1.3 Total_2012: 2 Country: Estonia","Population_mil: 5.5 Total_2012: 3 Country: Finland","Population_mil: 33.3 Total_2012: 1 Country: Morocco","Population_mil: 3.5 Total_2012: 0 Country: Moldova","Population_mil: 181.6 Total_2012: 0 Country: Nigeria","Population_mil: 10.8 Total_2012: 1 Country: Portugal","Population_mil: 1.2 Total_2012: 4 Country: TrinidadTobago","Population_mil: 5.8 Total_2012: 0 Country: U.A.E.","Population_mil: NA Total_2012: 2 Country: Latvia","Population_mil: NA Total_2012: 1 Country: Uganda","Population_mil: NA Total_2012: 1 Country: Botswana","Population_mil: NA Total_2012: 1 Country: Cyprus","Population_mil: NA Total_2012: 1 Country: Gabon","Population_mil: NA Total_2012: 1 Country: Guatemala","Population_mil: NA Total_2012: 1 Country: Montenegro","Population_mil: NA Total_2012: 2 Country: Rep.ofMoldova","Population_mil: NA Total_2012: 1 Country: Afghanistan","Population_mil: NA Total_2012: 1 Country: HongKongChina","Population_mil: NA Total_2012: 1 Country: SaudiArabia","Population_mil: NA Total_2012: 1 Country: Kuwait"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,0,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)"}},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null}],"layout":{"margin":{"t":25.2984640929846,"r":7.30593607305936,"b":39.252801992528,"l":43.1050228310502},"plot_bgcolor":"rgba(235,235,235,1)","paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"xaxis":{"domain":[0,1],"type":"linear","autorange":false,"tickmode":"array","range":[-75,1575],"ticktext":["0","500","1000","1500"],"tickvals":[0,500,1000,1500],"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"y","title":"Population_mil","titlefont":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"type":"linear","autorange":false,"tickmode":"array","range":[-7.5,157.5],"ticktext":["0","50","100","150"],"tickvals":[0,50,100,150],"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"x","title":"Total_2012","titlefont":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":false,"legend":{"bgcolor":"rgba(255,255,255,1)","bordercolor":"transparent","borderwidth":1.88976377952756,"font":{"color":"rgba(0,0,0,1)","family":"","size":11.689497716895}},"hovermode":"closest"},"source":"A","attrs":{"12d2f753a4e34":{"x":{},"y":{},"label":{},"type":"ggplotly"}},"cur_data":"12d2f753a4e34","visdat":{"12d2f753a4e34":["function (y) ","x"]},"config":{"modeBarButtonsToAdd":[{"name":"Collaborate","icon":{"width":1000,"ascent":500,"descent":-50,"path":"M487 375c7-10 9-23 5-36l-79-259c-3-12-11-23-22-31-11-8-22-12-35-12l-263 0c-15 0-29 5-43 15-13 10-23 23-28 37-5 13-5 25-1 37 0 0 0 3 1 7 1 5 1 8 1 11 0 2 0 4-1 6 0 3-1 5-1 6 1 2 2 4 3 6 1 2 2 4 4 6 2 3 4 5 5 7 5 7 9 16 13 26 4 10 7 19 9 26 0 2 0 5 0 9-1 4-1 6 0 8 0 2 2 5 4 8 3 3 5 5 5 7 4 6 8 15 12 26 4 11 7 19 7 26 1 1 0 4 0 9-1 4-1 7 0 8 1 2 3 5 6 8 4 4 6 6 6 7 4 5 8 13 13 24 4 11 7 20 7 28 1 1 0 4 0 7-1 3-1 6-1 7 0 2 1 4 3 6 1 1 3 4 5 6 2 3 3 5 5 6 1 2 3 5 4 9 2 3 3 7 5 10 1 3 2 6 4 10 2 4 4 7 6 9 2 3 4 5 7 7 3 2 7 3 11 3 3 0 8 0 13-1l0-1c7 2 12 2 14 2l218 0c14 0 25-5 32-16 8-10 10-23 6-37l-79-259c-7-22-13-37-20-43-7-7-19-10-37-10l-248 0c-5 0-9-2-11-5-2-3-2-7 0-12 4-13 18-20 41-20l264 0c5 0 10 2 16 5 5 3 8 6 10 11l85 282c2 5 2 10 2 17 7-3 13-7 17-13z m-304 0c-1-3-1-5 0-7 1-1 3-2 6-2l174 0c2 0 4 1 7 2 2 2 4 4 5 7l6 18c0 3 0 5-1 7-1 1-3 2-6 2l-173 0c-3 0-5-1-8-2-2-2-4-4-4-7z m-24-73c-1-3-1-5 0-7 2-2 3-2 6-2l174 0c2 0 5 0 7 2 3 2 4 4 5 7l6 18c1 2 0 5-1 6-1 2-3 3-5 3l-174 0c-3 0-5-1-7-3-3-1-4-4-5-6z"},"click":"function(gd) { \n // is this being viewed in RStudio?\n if (location.search == '?viewer_pane=1') {\n alert('To learn about plotly for collaboration, visit:\\n https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html');\n } else {\n window.open('https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html', '_blank');\n }\n }"}],"cloud":false},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1}},"base_url":"https://plot.ly"},"evals":["config.modeBarButtonsToAdd.0.click"],"jsHooks":{"render":[{"code":"function(el, x) { var ctConfig = crosstalk.var('plotlyCrosstalkOpts').set({\"on\":\"plotly_click\",\"persistent\":false,\"dynamic\":false,\"selectize\":false,\"opacityDim\":0.2,\"selected\":{\"opacity\":1}}); }","data":null}]}}</script>

---
# Variance inflation factor (VIF)

`$$\frac{1}{1-R_j^2}$$`

where `$R_j^2$` is computed by regressing variable `$j$` on all other variables. VIF is a measure the collinearity of the explanatory variables. Values greater than 10 are considered to be high.

These are the VIFs for the olympic medal tally data:

```
#>     Total_2012 Population_mil    GDP_PPP_bil 
#>            3.6            3.5            7.6
```

---
# Suppose we add 2008 counts as an explanatory variable

---
# Model

```
#>             term estimate std.error statistic p.value
#> 1    (Intercept)   2.2413   0.57642       3.9 2.3e-04
#> 2     Total_2012   0.9080   0.10146       8.9 3.7e-13
#> 3     Total_2008  -0.2025   0.10785      -1.9 6.5e-02
#> 4 Population_mil  -0.0278   0.00410      -6.8 3.3e-09
#> 5    GDP_PPP_bil   0.0028   0.00043       6.5 1.1e-08
```

Giving the model `$M2016=$` 2.24 `$+$` 0.91 `$2012+$` -0.2 `$M2008+$` -0.03 `$Pop+$` 0 `$GDP+\varepsilon$`

---
# VIFs

```
#>     Total_2012     Total_2008 Population_mil    GDP_PPP_bil 
#>           18.7           21.7            3.6            8.5
```

Notice that the VIFs for both 2008 and 2012 are high.

---
class: inverse middle 
# Your turn

- Why is it called `Variance Inflation Factor`? Look at the standard deviation of the estimates for the model with 2012 and without 2004.  
- Why would multicollinearity inflate variance of estimates?

---
# Solutions

- Drop some variables
- Use principal component regression (more advanced courses)
- Partial regression: Fit best variable. Regress next explanatory variable first variable and use the residuals from this fit as the second variable in the model. Continue with other variables.

---
# Resources

- [Regression Diagnostics: Identifying Influential Data and Sources of Collinearity](http://onlinelibrary.wiley.com/book/10.1002/0471725153)

---
class: inverse middle 
# Share and share alike

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.