---
title: "ETC 2420/5242 Lab 3 2017"
author: "Di Cook"
date: "SOLUTION"
output: pdf_document
---
```{r echo = FALSE, message = FALSE, warning = FALSE, warning = FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
error = FALSE,
echo = FALSE,
collapse = TRUE,
comment = "#",
fig.height = 4,
fig.width = 8,
fig.align = "center",
cache = FALSE
)
```
## Purpose
This lab is to examine testing hypotheses using permutation.
## Background
Read Sections 2.3, and the exercises in 2.9.3, of the online textbook "IntroStat with Randomization and Simulation".
## Problem description
Is yawning contagious? An experiment conducted by the MythBusters, a science entertainment TV program on the Discovery Channel, tested if a person can be subconsciously influenced into yawning if another person near them yawns. 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a group where there wasnâ€™t a person yawning near them (control). The following table shows the results of this experiment.
```{r}
yawn_expt <- data.frame(group=c(rep("treatment", 34), rep("control", 16)),
yawn=c(rep("yes", 10), rep("no", 24), rep("yes", 4), rep("no", 12)))
library(dplyr)
library(tidyr)
library(knitr)
yawn_expt %>%
group_by(group, yawn) %>%
tally() %>%
ungroup() %>%
spread(yawn, n, fill=0) %>%
mutate(total = rowSums(.[-1])) %>%
kable()
```
## Question 1 (5 pts)
a. How many subjects participated in the experiment? `50`
b. How were participants assigned to treatment and control groups? `Randomization`
c. What are the two variables that describe the experiment? `Group, yawn`
d. Compute the proportion of the treatment and control groups who yawned. Add this to the table. `r round(4/16, 3)`
e. Compute the difference in proportions between the two groups. `Control-Treatment is` `r round(4/16-10/34, 3)`
## Question 2 (3 pts)
The null hypothesis for the experiment is
$$H_o: p_{control} = p_{treatment}$$
a. Write the null hypothesis as an English sentence. `Yawning is NOT contagious`
b. What would be the alternative hypothesis tested by MythBusters? $H_a: p_{control} < p_{treatment}$
c. Explain your reasoning. `The original question, or the common belief is that yawning is contagious, which would correspond to a higher proportion of people yawning in the group with the yawning near them.`
## Question 3 (4 pts)
Write a function that permutes the `yawn` variable, and computes the difference between proportions of treatment and control groups.
```{r echo=TRUE}
prop_dif <- function(dat) {
dtbl <- dat %>%
mutate(yawn=sample(yawn)) %>%
group_by(group, yawn) %>%
tally() %>%
ungroup() %>%
spread(yawn, n, fill=0) %>%
mutate(total = rowSums(.[-1])) %>%
mutate(p = yes/total)
return(pdif=dtbl$p[2]-dtbl$p[1])
}
```
## Question 4 (4 pts)
a. Run the function 10000 times, saving the result.
```{r echo=TRUE}
set.seed(444)
pdif <- NULL
for (i in 1:10000)
pdif <- c(pdif, prop_dif(yawn_expt))
```
b. Make a histogram (or a dotplot) of the results.
```{r echo=TRUE}
library(ggplot2)
pdif <- data.frame(pdif)
ggplot(pdif, aes(x=pdif)) + geom_histogram(binwidth=0.025)
```
c. Draw a vertical line on the plot that represents the difference for the actual data.
```{r echo=TRUE}
ggplot(pdif, aes(x=pdif)) + geom_histogram(binwidth=0.025) +
geom_vline(xintercept=0.0441176, colour="red")
```
d. Compute the proportion of times that the permuted data yields a difference larger than the difference of the actual data.
```{r echo=TRUE}
length(pdif[pdif>0.0441176])/10000
```
## Question 5 (4 pts)
a. Compute the (permutation) p-value for testing the null hypothesis. `r length(pdif[pdif>0.0441176])/10000`
b. Based on your p-value, what is your decision about the null hypothesis? `Fail to reject the null`
c. Write a sentence stating your conclusion. `There is no difference between the proportion of people yawning in the treatment and control groups.`
d. Finally, based on these experimental results how would you answer "Is yawning contagious?" `There is no evidence from this study to suggest that yawning is contagious.`