class: center, middle, inverse, title-slide # Statistical distribution ### Earo Wang --- class: inverse middle center # Statistical distributions --- ## Compute the probabilities for `\(N(3.3, 1.1)\)` <img src="figure/ex-plot-1.svg" style="display: block; margin: auto;" /> --- * `\(\Pr(X < 1.3)\)` ```r pnorm(1.3, mean = 3.3, sd = 1.1) ``` ``` ## [1] 0.03451817 ``` * `\(\Pr(X > 1.9) = 1 - \Pr(X \le 1.9)\)` ```r pnorm(1.9, mean = 3.3, sd = 1.1, lower.tail = FALSE) ``` ``` ## [1] 0.8984426 ``` ```r 1 - pnorm(1.9, mean = 3.3, sd = 1.1) ``` ``` ## [1] 0.8984426 ``` * `\(\Pr(1.8 < X < 2.2) = \Pr(X < 2.2) - \Pr(X < 1.8)\)` ```r pnorm(2.2, mean = 3.3, sd = 1.1) - pnorm(1.8, mean = 3.3, sd = 1.1) ``` ``` ## [1] 0.07231423 ``` --- ## Compute the quantile values for `\(N(-10, 4)\)` ```r qnorm(c(0.53, 0.12, 0.84, 1.2), mean = -10, sd = 4) ``` ``` ## [1] -9.698921 -14.699947 -6.022168 NaN ``` ## Compute the densities for `\(N(12, 5)\)` ```r dnorm(c(13, 4, 20), mean = 12, sd = 5) ``` ``` ## [1] 0.07820854 0.02218417 0.02218417 ``` --- ## Plot the density curves for Weibull ```r library(tidyverse) xgrid <- seq(0, 5, 0.05) df <- data.frame( x = xgrid, d1 = dweibull(xgrid, shape = 3, scale = 1.5), d2 = dweibull(xgrid, shape = 2, scale = 2), d3 = dweibull(xgrid, shape = 1, scale = 1) ) df <- df %>% gather(dist, density, d1:d3) %>% mutate(dist = factor( dist, levels = c("d1", "d2", "d3"), labels = c("Weibull(3, 1.5)", "Weibull(2, 2)", "Weibull(1,1)") )) ggplot(df, aes(x=x, y = density, colour = dist)) + geom_line() ``` --- <img src="figure/ex-weibull-1.svg" style="display: block; margin: auto;" /> --- class: inverse middle center # QQ-plot --- ## Simulate sample of size `\(n = 500\)` from `\(\Gamma(4, 2)\)` ```r set.seed(123) n <- 500 alpha <- 4 beta <- 2 df_gamma <- data.frame( xgamma = rgamma(n, shape = alpha, rate = beta) ) head(df_gamma) ``` ``` ## xgamma ## 1 1.2649897 ## 2 3.0404103 ## 3 0.5263902 ## 4 1.8730270 ## 5 3.7219774 ## 6 2.2077031 ``` --- ## QQ-Plot computation a. Sort and standardise the sample values from low to high b. Theoretical quantiles, `\(n=\)` sample size `$$\begin{align}1 - 0.5^{(1/n)} \quad & i=1 \\ \frac{i - 0.3175}{n + 0.365} \quad & i=2, ..., n-1 \\ 0.5^{(1/n)} \quad & i=n \end{align}$$` ```r df_gamma$xq <- qgamma( c(1 - 0.5^(1/n), # i = 1 (2:(n-1) - 0.3175) / (n + 0.365), # i = 2, ... , n - 1 0.5^(1/n)), # i = n alpha, beta ) # theoretical quantiles head(df_gamma) ``` ``` ## xgamma xq ## 1 1.0926928 0.1474485 ## 2 2.9469747 0.1985390 ## 3 0.3749567 0.2328525 ## 4 1.7156590 0.2598571 ## 5 3.6785302 0.2826566 ## 6 2.0649844 0.3026649 ``` --- c. Plot the theoretical vs sample quantiles ```r ggplot(data = df_gamma, aes(x = sort(xgamma), y = xq)) + geom_point() + geom_abline(intercept = 0, slope = 1) + # X = Y line xlab("Sample quantiles") + ylab("Theoretical quantiles") + coord_equal() ``` <img src="figure/q1-c-1.svg" style="display: block; margin: auto;" /> --- class: inverse middle center # Likelihood function --- ## Histogram for the sample ```r set.seed(123) X2 <- data.frame(x = rgamma(n = 544, 3.2, 1.7)) ``` **Your turn**: write `ggplot2` code to produce the histogram below: <img src="figure/q2-hist-1.svg" style="display: block; margin: auto;" /> --- ## Compute the likelihood function `$$\begin{aligned} L(\alpha, \beta) &=\Pr(X_1=x_1,X_2=x_2, ... ,X_n=x_n~\big\vert~\alpha, \beta) \\ &=f(x_1\big\vert\alpha, \beta)f(x_2\big\vert\alpha, \beta)\cdots f(x_n\big\vert\alpha, \beta) \\ &=\prod_{i=1}^n f(x_i;\alpha, \beta) \end{aligned}$$` ```r nmle <- function(x, alpha, beta) { # L(alpha, beta) = product of joint density functions } ``` ```r alpha <- seq(2.7, 3.8, 0.01) beta <- seq(1.45, 2.2, 0.01) # all possible combinations of alpha and beta g <- expand.grid(x = alpha, y = beta) g$l <- 0 for (i in 1:nrow(g)) { g$l[i] <- nmle(X2$x, g$x[i], g$y[i]) } ``` --- ## Plot the likelihood function for a range of values of `\(\alpha\)` and `\(\beta\)` <img src="figure/q2-plot2-1.svg" style="display: block; margin: auto;" /> --- ## Interactive 3D surface plot using `plotly`
--- ## Interactive 3D surface plot using `plotly` If you're curious about how to produce the 3D plot, here's the code: ```r m <- matrix(g$l, nrow = length(alpha)) library(plotly) plot_ly(x = ~ beta, y = ~ alpha, z = m) %>% add_surface() ``` --- ## Find the MLE estimates for `\(\alpha\)`, `\(\beta\)` ```r library(MASS) fitdistr(X2$x, "gamma") ``` ``` ## shape rate ## 3.3879099 1.8832870 ## (0.1961500) (0.1175305) ``` p.s. if you have trouble in installing the `CASdatasets` package, the following code may solve the issue. ```r # install.packages("xts") # install.packages("sp") # install.packages("CASdatasets", # repos = "http://cas.uqam.ca/pub/R/", type = "source") library(CASdatasets) data(usworkcomp) ``` Hint: rescale before the `fitdistr` function to avoid numerical problems: ```r (usworkcomp$LOSS + 100) / 100000 ``` --- ## Online R resources * [A list of distribution functions in R](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html) * [Cookbook for R](http://www.cookbook-r.com/Graphs/)