class: center, middle, inverse, title-slide # Statistical Thinking using Randomisation and Simulation ## Introduction and motivation ### Di Cook (
) ### W1.C1 --- # Overview of the class - Topics - Assessment - Resources - Instructors, tutors --- # Topics - Topic 1: Simulation of games for decision strategies (2 weeks)) - Topic 2: Statistical distributions for decision theory (1.5 weeks) - Topic 3: Linear models for credibility theory (1.5 weeks) - Topic 4: Compiling data to problem solve (2 weeks) - Topic 5: Bayesian statistical thinking (1.5 weeks) - Topic 6: Temporal data and time series models (1.5 weeks) - Topic 7: Modeling risk and loss, with data and using randomization to assess uncertainty (2 weeks) --- # Assessment - Final exam: 60% - Tutorials/labs: 30%, Weekly reports due Monday noon after the lab - Quizzes: 10% - ETC5242 students: Labs 15%, Project report and presentation 15% --- # Resources - Web site: [https://st.netlify.com](https://st.netlify.com) - Moodle - [Statistics online textbook](https://www.openintro.org/stat/textbook.php?stat_book=isrs ) - [Accuarial online curriculum/exam material](https://www.actuaries.org.uk/studying/plan-my-study-route/fellowshipassociateship/core-technical-subjects/ct6-statistical-methods) - Software: [R](https://cran.r-project.org), [RStudio Desktop](https://www.rstudio.com/products/rstudio/download2/) --- # Instructors - Instructors: - Professor Di Cook, Menzies 762A - Tutors: - Stuart Lee (working with Di on PhD) - Dilini Talagala (working with Rob Hyndman on PhD) - Thiyanga Talagala (working with Rob Hyndman on PhD) - Nathaniel Tomasetti (worked with Di for Honors, working with Dr Catherine Forbes on PhD) - Earo Wang (working with Di on PhD) --- # What is randomness? - Coin flip - Die roll - Your sporting team wins - Gender of a baby - Rain tomorrow - Stock price in an hour from now - Lightning strike - Pipe burst --- class: inverse middle # Your turn We are going to play a game of "Stump the Professor". Flip a coin. If it shows up tails do A first, if it shows up heads to B first. `A. Write down a sequence of heads and tails that you might expect to come from TWENTY flips of a coin` `B. Now flip a coin TWENTY times, and write down the outcomes` - Enter these in the [online sheet](https://docs.google.com/forms/d/155fP-mdd0HevqNYEVUngEBVWHXmxYi-B5zPzKjikEb0/edit) (Remember whether you entered the coin flip sequence first or the made up sequence.) - Now I am going to look at what you entered, and guess if sequence was made up, or actual outcomes from coin flips. - You record how many times I get it right. --- # Example: a look at the Australian electoral distribution - Results of 2013 election from Australian Electoral Commission web site - 2011 Census data from the Australian Bureau of Statistics - Combined demographics of electorate with political representation - Interactive application, in R package `eechidna` --- # How to use randomization to understand probability !(week1.class1_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- class: inverse middle # Your turn - What is the difference (roughly) in population between the biggest and smallest electorates? - What is the relative worth of a voter in the electorate with the largest population, compared to a voter in the electorate with the smallest population? --- # Politics - Ideally all electorates have exactly the same number of people. - Geography can interfere with this, e.g an electorate cannot be part in Tasmania and part in Victoria. - The Australian Electoral Commission will adjust geographic boundaries before each election to adjust for population changes as measured in the most recent Census. --- # Compute averages ``` #> # A tibble: 2 x 3 #> PartyGp m s #> <chr> <dbl> <dbl> #> 1 ALP 149245 20167 #> 2 Coalition 139537 12337 ``` --- # Statistical thinking - The means are different - How big is this difference? - How likely is this difference to have arisen by chance? We could use a two-sample t-test to answer these, but here is how to do the equivalent by randomization. --- # Procedure - Compute the statistic for the data (e.g. absolute value of mean difference) - Shuffle the group labels (e.g. put the MP party names into a hat, mix them around, draw them and assign to new electorate) - Compute the statistic for this shuffled data - Repeat steps 2, 3 many times - Examine how often the value of the data statistic, or a larger value occurs --- # Let's do it <img src="week1.class1_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> Let's also count the number of times that we see a bigger difference by chance. It is 1. --- # What does this mean? If we oberve a difference this large 1 out of 1000 random shuffles, is it likely to see this electorate distribution by chance? --- # Caveats Let's wait until the next Census results are in (after August this year) and the latest election results, to compare populations of electorates again. --- class: inverse middle # Share and share alike <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.