class: center, middle, inverse, title-slide # Bootstrap Confidence Intervals ### Thiyanga Talagala --- ## Data Model building will be done using: - Response: `science` (standardised) - Explanatory variables: `ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA`. ```r library(tidyverse) load("pisa_au.rda") pisa_au <- pisa_au %>% mutate(science = (PV1SCIE+PV2SCIE+PV3SCIE+PV4SCIE+ PV5SCIE+PV6SCIE+PV7SCIE+PV8SCIE+ PV9SCIE+PV10SCIE)/10) %>% select(science, ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA, SENWT) ``` --- class: inverse middle # Your turn - remove cases with missing values - change gender to be a factor - standardise the science scores --- ## Model building ```r aus_glm <- glm(science_std~gender+JOYSCIE+nbooks+ntvs, data=aus_nomiss, weights=SENWT) summary(aus_glm) ``` ``` ## ## Call: ## glm(formula = science_std ~ gender + JOYSCIE + nbooks + ntvs, ## data = aus_nomiss, weights = SENWT) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.42754 -0.29052 -0.00837 0.28447 1.97050 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.324459 0.043067 -7.534 5.27e-14 *** ## genderm 0.052947 0.015299 3.461 0.00054 *** ## JOYSCIE 0.277368 0.006645 41.744 < 2e-16 *** ## nbooks 0.216462 0.005469 39.576 < 2e-16 *** ## ntvs -0.113736 0.010720 -10.610 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for gaussian family taken to be 0.2542255) ## ## Null deviance: 4231.3 on 12355 degrees of freedom ## Residual deviance: 3139.9 on 12351 degrees of freedom ## AIC: 35006 ## ## Number of Fisher Scoring iterations: 2 ``` --- ## Classical confidence interval `$$\hat{\beta}_{4} \pm t_{\alpha/2, n - (k + 1)} \times \mathrm{S.E.}(\hat{\beta}_{4})$$` ```r coef <- summary(aus_glm)$coefficients n <- nrow(aus_nomiss)# no. of observations beta_4 <- coef[4, 1] # coefficient se_4 <- coef[4, 2] # standard error df <- n - 5 # degree of freedom = n - (k + 1) t_crit <- qt(0.975, df) # t critical value c(beta_4 - t_crit * se_4, beta_4 + t_crit * se_4) # (lower, upper) ``` ``` ## [1] 0.2057405 0.2271826 ``` --- ## Confidence intervals via bootsrap Step 1: - `boot` package can generate bootsrap samples for weighted data. - To use the `boot` function for drawing samples, you need a function to compute the statistic of interest. - Write the function to return the slope for `nbooks` after fitting a `glm` to a bootstrap sample. The skeleton of the function: ```r calc_stat <- function(d, i) { # d-data, i - vector of indices of the bootstrap sample x <- d[i,] mod <- FILL IN THE NECESSARY CODE stat <- FILL IN THE NECESSARY CODE return(stat) } ``` --- ## Confidence intervals via bootsrap (cont.) Step 2: - Make a 95% bootsrap confidence interval ```r library(boot) stat <- boot(aus_nomiss, statistic=calc_stat, R=1000, weights=aus_nomiss$SENWT) stat ``` --- ## Confidence intervals via bootsrap (cont.) ``` ## ## WEIGHTED BOOTSTRAP ## ## ## Call: ## boot(data = aus_nomiss, statistic = calc_stat, R = 1000, weights = aus_nomiss$SENWT) ## ## ## Bootstrap Statistics : ## original bias std. error mean(t*) ## t1* 0.2164616 -0.001094138 0.006303779 0.2153674 ``` The 95% bootsrap confidence interval is ```r c(sort(stat$t)[25],sort(stat$t)[975]) ``` ``` ## [1] 0.2023871 0.2271530 ``` --- ## Bootstrap confidence interval for predicted value Step 1: - Write a function to calculate the predicted value for a girl, enjoys science (JOYSCIE=1.0) two TVs and 26-100 books in the home. The weight for a student like this is 0.2. The skeleton of the function: ```r calc_pred <- function(d, i, newd) { x <- d[i,] mod <- FILL IN THE NECESSARY CODE pred <- FILL IN THE NECESSARY CODE return(pred) } ``` Step 02: - Make a 95% bootsrap confidence interval - Be sure to convert the values back into the actual math score range --- ## Prediction Interval - Compute the residuals from the fitted model - Bootstrap the residuals - Find the desired quantiles of the residuals - Compute prediction intervals by adding residual quantiles to fitted value ```r calc_res <- function(d, i) { x <- d[i,] mod <- FILL IN THE NECESSARY CODE res <- FILL IN THE NECESSARY CODE return(c(l=min(res), u=max(res))) } ```