---
title: 'Statistical Thinking using Randomisation and Simulation'
subtitle: "Hypothesis testing"
author: Di Cook (dicook@monash.edu, @visnut)
date: "W2.C2"
output:
xaringan::moon_reader:
css: ["default", "myremark.css"]
self_contained: false
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---
```{r setup, include = FALSE}
library(knitr)
opts_chunk$set(
message = FALSE,
cache = FALSE,
fig.height = 5,
fig.width = 7,
fig.align = "center",
collapse = TRUE,
comment = "#>"
)
options(digits=2)
library(tidyverse)
```
# Review of hypothesis testing
- Terminology: null and alternative hypothesis, test statistic, significance value, p-value, critical value, decision, power, one-sided or two-sided test
- Common basic tests: one sample t-test, two sample t-test, paired sample t-test, chisquared test (for association), analysis of variance
---
# Hypotheses
- The two hypotheses need to cover all possibilities.
- Alternative hypothesis is usually the interesting outcome, sometimes its easier to write this first. It is usually stated as an inequality.
- Null hypothesis is the usually stated as a parameter is `=` to some specified value. It represents a historical value, or a conservative state of business as usual where we would only rule against it if there is overwhelming evidence.
---
# Using permutation to examine association
Example from DBCR:
*The participants in this study were 48 male bank supervisors attending a management institute at the University of North Carolina in 1972. They were asked to assume the role of the personnel director of a bank and were given a personnel file to judge whether the person should be promoted to a branch manager position. The files given to the participants were identical, except that half of them indicated the candidate was male and the other half indicated the candidate was female. These files were randomly assigned to the subjects.*
---
# Experiments vs observational study
- Researchers perform an observational study when they collect data in a way that does not directly interfere with how the data arise.
--
- When researchers want to investigate the possibility of a causal connection, they conduct an experiment. Researchers will collect a sample of individuals and split them into groups. The individuals in each group are assigned a treatment. When individuals are randomly assigned to a group, the experiment is called a randomized experiment.
--
---
class: inverse
# Your turn
- __Is the bank data from an observational study or an experiment?__
--
- __How does the type of study impact what can be inferred from the results?__
---
class: inverse
# Your turn
- What is the null hypothesis?
--
- $H_o:$ promotion chances are not related to gender
--
- What is the alternative hypothesis?
--
- $H_a:$ females are promoted at a lower rate
--
- Write these in statistical notation
--
- $H_o: p_M = p_F$, $H_a: p_M > p_F$
---
# Look at the data
```{r echo=FALSE}
bank <- data.frame(gender=c(rep("male", 24), rep("female", 24)),
decision=c(rep("promoted", 21), rep("not promoted", 3),
rep("promoted", 14), rep("not promoted", 10)))
```
```{r echo=FALSE, eval=FALSE}
bank %>% group_by(gender, decision) %>%
tally() %>%
spread(decision, n)
```
```{r echo=FALSE}
addmargins(table(bank$gender, bank$decision))
prop.table(table(bank$gender, bank$decision), 1)
```
---
# How likely is this if there really is no gender bias in promotion?
We can do this using permutation.
- __Generate samples that are consistent with $H_o$.__ Break any association between gender and decision.
- __Re-tabulate__
- Compute proportion of women promoted
- Repeat these steps many, many times
---
# Data table looks like this
Original:
```{r echo=FALSE}
set.seed(2)
x <- bank %>% sample_n(10) %>%
arrange(gender, decision)
x
```
---
# Scramble one column
Permuted decision:
```{r echo=FALSE}
set.seed(6)
x_new <- x %>% mutate(decision=sample(decision))
x_new
```
---
# Let's do it
```{r}
prop <- bank %>% group_by(gender, decision) %>%
tally() %>%
ungroup() %>%
group_by(gender) %>%
mutate(p = n/sum(n))
```
The difference between men and women promoted is `r prop$p[4]-prop$p[2]`.
---
# Permute 100 times and re-calculate
```{r echo=FALSE}
set.seed(1)
pprop <- NULL
for (i in 1:100) {
x_new <- bank %>% mutate(decision=sample(decision)) %>%
group_by(gender, decision) %>%
tally() %>%
ungroup() %>%
group_by(gender) %>%
mutate(p = n/sum(n))
pprop <- c(pprop, x_new$p[4]-x_new$p[2])
}
pprop <- data.frame(p=pprop)
ggplot(pprop, aes(x=p)) + geom_dotplot(stackratio=0.7, alpha=0.8) +
geom_vline(xintercept=prop$p[4]-prop$p[2], colour="red")
```
There are `r length(pprop$p[pprop$p>=(prop$p[4]-prop$p[2])])` null values larger than or equl to the difference in the data set, `r prop$p[4]-prop$p[2]`. This gives a $p$-value equal to `r length(pprop$p[pprop$p>=(prop$p[4]-prop$p[2])])/length(pprop$p)` which is small. It is unlikely to see a difference between promotion rates as large as this purely by chance.
---
# Hypothesis testing process
- Null hypothesis
- Alternative hypothesis
- Assumptions
- Significance level
- Sampling distribution under the null hypothesis Test statistic
- P-value
- Decision
---
# Null and alternative hypothesis
- The null hypothesis ( $H_0$ ) is often a statement that no effect or no difference is present. For example, the value of a population parameter (e.g. mean) is equal to some claimed value
- The alternative hypothesis ( $H_a$ ) represents an alternative claim under consideration and is often represented by a range of possible values for the value of interest.
- Failing to find strong evidence for the alternative hypothesis is not equivalent to providing evidence that the null hypothesis is true.
---
The hypotheses are stated in terms of population parameters.
We observe a sample from the population
- Mean $\mu$
- Proportion $\pi$
- Difference between means $\mu_2-\mu_1$
- Difference in proportions $\pi_2-\pi_1$
- Correlation $\rho$
Sample statistics are used to compute the test statistic with the sample
- Sample mean $\bar{X}$
- Sample Proportion $p$
- Difference between sample means $\bar{X}_2 - \bar{X}_1$
- Difference in sample proportions $p_2-p_1$
- Sample correlation $r$
---
- Directional tests (one-sided)
- one sample: $H_0 : \mu=\mu_0$ and $H_a : \mu > (or <) \mu_0$
- two samples: $H_0 :\mu_1=\mu_2$ and $H_a :\mu_1 >(or <)\mu_2$
- Keywords in the problem description: reduce, improve, higher, lower, greater than
- Non-directional tests (two-sided):
- one sample: $H_0 :\mu=\mu_0$ and $H_a : \mu \neq \mu_0$
- two samples: $H_0 : \mu_1 = \mu_2$ and $H_a : \mu_1 \neq \mu_2$
- Keywords in the problem description: change, different from
---
# Significance level and decision errors
- $\alpha$: Probability of Type I Error
- The defendant is innocent ($ H_0 $ true) but wrongly convicted
- $\beta$: Probability of Type II Error
- The court failed to reject $H_0$ (i.e. failed to convict the person) when he was in fact guilty ($ H_a$ true).
- $1 − \beta$: Power of the test
The significance level selected for a test should reflect the real-world consequences associated with making a Type 1 or Type 2 Error.
---
# Test statistic
- A test statistic is a numerical summary of the data.
- The sampling distribution of the test statistic is the probability distribution of the test statistic when the null hypothesis is true.
- E.g. the t-test for $H_0: \mu = \mu_0$ has a test statistic of
$$t = \frac{\bar{X}-\mu_0}{s/\sqrt{n}}$$
and it follows a t-distribution with d.f. $n-1$.
---
# The p-value
The $p$-value is the probability of observing the test statistic value or one more extreme, if the null hypothesis were true.
---
# The p-value and decision
We say that the data provide statistically significant evidence against the null hypothesis if the $p$-value is less than the significance level $\alpha$ (this result would rarely occur just by chance).
---
class: inverse middle
# Share and share alike

This work is licensed under a Creative Commons Attribution 4.0 International License.