class: center, middle, inverse, title-slide # Melbourne Pedestrian Traffic --- ### What do we want to do? * Investigate how weather impacts foot traffic around Melbourne. ### How are we doing this? * Collect and combine data from pedestrian sensors and weather stations * Explore the data a little bit * Build and evalute a Poisson regression model We'll need a few packages to help us out ```r library(tidyverse) library(ggmap) library(gridExtra) library(readr) library(knitr) library(broom) library(rwalkr) ``` --- class:center ### A Map of Melbourne ![](index_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ### A subset of the weather data is contained in 'melb_ghcn.csv'. ```r melb_ghcn <- read_csv("melb_ghcn.csv") melb_ghcn %>% select(stn_id, date, variable, value) %>% head(4) ``` ``` ## # A tibble: 4 x 4 ## stn_id date variable value ## <chr> <int> <chr> <int> ## 1 ASN00086071 20130101 TMAX 253 ## 2 ASN00086071 20130101 TMIN 154 ## 3 ASN00086071 20130101 PRCP 0 ## 4 ASN00086071 20130102 TMAX 222 ``` What do the variable and value columns mean? The data is manipulated, here's a few important lines: ```r mutate(value = value/10, high_prcp = ifelse(PRCP>5, "rain", "none"), high_tmp = ifelse(TMAX>33, "hot", "not"), low_tmp = ifelse(TMIN<6, "cold", "not")) melb_ghcn_wide$PRCP[is.na(melb_ghcn_wide$PRCP)] <- 0 ``` --- ### Temperature over Time ![](index_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ### Choose a sensor for your group. ```r # Read sensor counts ped_sub <- read_csv("pedestrian_counts_sub.csv") ped_sub <- ped_sub %>% filter(year < 2015) %>% dplyr::arrange(sensor_id, date, time) unique(ped_sub$sensor_name) ``` ``` ## [1] "Bourke Street Mall (South)" ## [2] "Melbourne Central" ## [3] "Town Hall (West)" ## [4] "Princes Bridge" ## [5] "Flinders Street Station Underpass" ## [6] "Webb Bridge" ## [7] "Southern Cross Station" ## [8] "Victoria Point" ## [9] "Waterfront City" ## [10] "Flagstaff Station" ## [11] "Sandridge Bridge" ``` --- ### We have to make a few changes to the code ```r ped_run <- run_melb(year = 2013:2014, sensor = "Bourke Street Mall (North)") ped_run <- ped_run %>% rename(count = Count, time = Time, date = Date) %>% mutate(day = wday(date, label=TRUE), month = month(date, label=TRUE)) ped_weather <- left_join(ped_run, melb_ghcn_wide, by="date") ped_weather <- ped_weather %>% mutate(time = factor(ped_weather$time), high_prcp = factor(ped_weather$high_prcp, levels=c("none", "rain")), high_tmp = factor(ped_weather$high_tmp, levels=c("not", "hot")), low_tmp = factor(ped_weather$low_tmp, levels=c("not", "cold"))) ``` --- ### Poisson Regression `$$\log(\mu_i) = \beta_0 + \sum_{j=1}^K \beta_j x_{i, j}$$` `$$y_i = \mu_i + e_i$$` The dependent variable has a *poisson* distribution with mean `\(\mu_i\)`. We want three way interactions between day of the week, month of the year, and hour of the day: `$$\log(\mu_i) = \mbox{Standard Variables} + \beta_{44}(\mbox{Day = Monday AND Time = 00:00}) + \dots$$` `$$+ \beta_{??} (\mbox{Day = Thursday AND Time = 14:00 AND Month = June}) + \dots$$` In R: ```r glm(count~day*time*month+high_tmp+low_tmp+high_prcp, data=ped_weather, family=poisson(link="log")) ``` There are 2019 coefficients to estimate, be patient. --- ### Predictions for the Poisson Regression Many of the variables are factors, make sure to convert your prediction set to the right format. ```r newdat <- data.frame() #Something needs to go here newdat$time <- factor(newdat$time, levels=0:23) newdat$high_tmp <- factor(newdat$high_tmp, levels=c("not", "hot")) newdat$low_tmp <- factor(newdat$low_tmp, levels=c("not", "cold")) newdat$high_prcp <- factor(newdat$high_prcp, levels=c("none", "rain")) newdat$day <- factor(newdat$day, levels=c("Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun")) newdat$month <- factor(newdat$month, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) ```