1. Overview

In this take-home exercise, we will observe and study the data from VAST Challenge 2022. By doing this, we will be able to characterized the distinct areas of the city and identify the busiest areas in Engagement.

2. Data Preparation

2.1 Package installation

In this study, we will be using readr, sf, and tmap packages of R, so that we will need to install and launch before we start our data preparation.

Show

packages = c('sf', 'tmap', 'tidyverse', 
             'lubridate', 'clock', 
             'sftime', 'rmarkdown')

for(p in packages){
  if(!require(p,character.only= T)){
    install.packages(p)
  }
  library(p,character.only=T)
}

2.2 Import Data

In the code chunk below, we will use read_sf() to parse the data files into R as sf data.frames.

Show

schools <- read_sf("data/wkt/Schools.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")  
pubs <- read_sf("data/wkt/Pubs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/wkt/Apartments.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/wkt/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/wkt/Employers.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/wkt/Restaurants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

2.3 Overview of Data

The following code chunk allows us to have a overview of the building details.

Show

print(buildings)

Simple feature collection with 1042 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -4762.191 ymin: -30.08359 xmax: 2650 ymax: 7850.037
CRS:           NA
# A tibble: 1,042 x 5
   buildingId                       location buildingType maxOccupancy
   <chr>                           <POLYGON> <chr>        <chr>       
 1 1          ((350.0639 4595.666, 390.0633~ Commercial   ""          
 2 2          ((-1926.973 2725.611, -1948.1~ Residental   "12"        
 3 3          ((685.6846 1552.131, 645.9985~ Commercial   ""          
 4 4          ((-976.7845 4542.382, -1053.2~ Commercial   ""          
 5 5          ((1259.306 3572.727, 1299.255~ Residental   "2"         
 6 6          ((478.8969 1082.484, 473.6596~ Commercial   ""          
 7 7          ((-1920.823 615.7447, -1960.8~ Residental   ""          
 8 8          ((-3302.657 5394.354, -3301.5~ Commercial   ""          
 9 9          ((-600.5789 4429.228, -495.95~ Commercial   ""          
10 10         ((-68.75908 5379.924, -28.782~ Residental   "5"         
# ... with 1,032 more rows, and 1 more variable: units <chr>

3. Composite map

The code chunk below composite map by combining buildings according to the given location. Apartment, employers, pubs, restaurants, and schools are highlighted in different colors on the map.

Show

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(employers) +
  tm_dots(col = "red") +
tm_shape(apartments) +
  tm_dots(col = "lightblue") +
tm_shape(pubs) +
  tm_dots(col = "green") +
tm_shape(restaurants) +
  tm_dots(col = "blue") +
tm_shape(schools) +
  tm_dots(col = "yellow")

Show

tmap_mode("plot")

In order to find out the busiest location, we will create a hexagon binning map by using R.

In the code chunk below, we will use st_make_grid() to create haxegons.

Show

hex <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id')
plot(hex)

We selected a 15 days record of participants’ activities to identify the location with the most population.

Show

write_rds(logs_selected,"Data/rds/logs_selected.rds")

Show

logs_selected <-read_rds("data/rds/logs_selected.rds")

The code chunk below perform point in polygon overlay by using st_join().

Show

points_in_hex <- st_join(logs_selected, 
                         hex, 
                         join=st_within)
#plot(points_in_hex, pch='.')

st_join() is used to count the number of event points in the hexagons.

Show

points_in_hex <- st_join(logs_selected, 
                        hex, 
                        join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id)
head(points_in_hex)

# A tibble: 6 x 2
  hex_id pointCount
   <int>      <int>
1    169         35
2    212         56
3    225         21
4    226         94
5    227         22
6    228         45

Here were join these two tables by hex_id as the join ID.

Show

hex_combined <- hex %>%
  left_join(points_in_hex, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

From the map we could tell that the arear with darker color is busier.

Show

tm_shape(hex_combined %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1)

Take home exercise 5

1. Overview

2. Data Preparation

2.1 Package installation

2.2 Import Data

2.3 Overview of Data

3. Composite map