class: center, middle, inverse, title-slide # Statistical Thinking using Randomisation and Simulation ## Fitting Models ### Di Cook (
dicook@monash.edu
,
@visnut
) ### W4.C1 --- # Overview of this class - Fitting a distribution for olympic medal tallies --- # Olympic medals, 2012 London <img src="week4.class1_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- # Data - Extracted from [https://www.olympic.org/london-2012](https://www.olympic.org/london-2012) - Now it is easier to pull data from [wikipedia](https://en.wikipedia.org/wiki/2012_Summer_Olympics_medal_table) - 204 countries participated, only countries that scored a medal (85) are listed in the medal table --- # Medal tally - Examine the distribution of medal counts - Need to add 119 zeros, to account for participating countries that did not get a medal - Distribution is right-skewed, heavily, and unimodal - Use maximum likelihood to estimate parameters for plausible distributions --- # Fit distribution using Poisson ``` #> lambda #> 4.72 #> (0.15) ``` <img src="week4.class1_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- # Try lognormal ``` #> meanlog sdlog #> 0.779 1.137 #> (0.080) (0.056) ``` <img src="week4.class1_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Try weibull ``` #> shape scale #> 0.707 4.106 #> (0.033) (0.434) ``` <img src="week4.class1_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Try pareto ``` #> c #> 1.28 #> (0.09) ``` <img src="week4.class1_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- # Optimization actually fails <img src="week4.class1_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- # Manually Actually using `\(c=0.96\)`. <img src="week4.class1_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- # Predict largest medal count Using this model, what is the probability of observing a tally of more than 50 medals for a country? `\(P(X>50)\)` ```r ppareto <- function(q, c) { if (c<=0) stop("c must be positive > 0") ifelse(q<1, 0, 1-1/q^c) } 1-ppareto(50, 0.96) #> [1] 0.023 ``` --- # How many would we expect? If there are 204 countries, how many of them would we expect to earn more than 50 medals, assuming the `\(Pareto(0.96)\)` model? ```r 204*(1-ppareto(50, 0.96)) #> [1] 4.8 ``` and how does this compare to the observed number? ```r library(dplyr) df %>% filter(Total>50) #> Total #> 1 65 #> 2 82 #> 3 88 #> 4 104 ``` --- # How well does this fit 2008 medal tally? <img src="week4.class1_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- # And 2004? <img src="week4.class1_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- # Doping in sports - finding anomalies ![](athletics-women.png) Source: FT research, image extracted from [http://blogs.ft.com/ftdata/2015/11/16/doping-in-athletics/](http://blogs.ft.com/ftdata/2015/11/16/doping-in-athletics/) --- # YOUR TURN: How could we improve the model? --- # - What dependencies are there in the medal tallies? - What varies among Olympic years? - What factors might affect the medal counts? --- # Resources - [2012 Medal tally](https://en.wikipedia.org/wiki/2012_Summer_Olympics_medal_table) - [2008 Medal tally](https://en.wikipedia.org/wiki/2008_Summer_Olympics_medal_table) - [2004 Medal tally](https://en.wikipedia.org/wiki/2004_Summer_Olympics_medal_table) - [http://blogs.ft.com/ftdata/2015/11/16/doping-in-athletics/](http://blogs.ft.com/ftdata/2015/11/16/doping-in-athletics/) --- class: inverse middle # Share and share alike <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.