In this take-home exercise, we will observe and study the data from VAST Challenge 2022. By doing this, we will be able to characterized the distinct areas of the city and identify the busiest areas in Engagement.
In this study, we will be using readr, sf, and tmap packages of R, so that we will need to install and launch before we start our data preparation.
packages = c('sf', 'tmap', 'tidyverse',
'lubridate', 'clock',
'sftime', 'rmarkdown')
for(p in packages){
if(!require(p,character.only= T)){
install.packages(p)
}
library(p,character.only=T)
}
In the code chunk below, we will use read_sf() to parse the data files into R as sf data.frames.
schools <- read_sf("data/wkt/Schools.csv",
options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/wkt/Pubs.csv",
options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/wkt/Apartments.csv",
options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/wkt/Buildings.csv",
options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/wkt/Employers.csv",
options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/wkt/Restaurants.csv",
options = "GEOM_POSSIBLE_NAMES=location")
The following code chunk allows us to have a overview of the building details.
print(buildings)
Simple feature collection with 1042 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -4762.191 ymin: -30.08359 xmax: 2650 ymax: 7850.037
CRS: NA
# A tibble: 1,042 x 5
buildingId location buildingType maxOccupancy
<chr> <POLYGON> <chr> <chr>
1 1 ((350.0639 4595.666, 390.0633~ Commercial ""
2 2 ((-1926.973 2725.611, -1948.1~ Residental "12"
3 3 ((685.6846 1552.131, 645.9985~ Commercial ""
4 4 ((-976.7845 4542.382, -1053.2~ Commercial ""
5 5 ((1259.306 3572.727, 1299.255~ Residental "2"
6 6 ((478.8969 1082.484, 473.6596~ Commercial ""
7 7 ((-1920.823 615.7447, -1960.8~ Residental ""
8 8 ((-3302.657 5394.354, -3301.5~ Commercial ""
9 9 ((-600.5789 4429.228, -495.95~ Commercial ""
10 10 ((-68.75908 5379.924, -28.782~ Residental "5"
# ... with 1,032 more rows, and 1 more variable: units <chr>
The code chunk below composite map by combining buildings according to the given location. Apartment, employers, pubs, restaurants, and schools are highlighted in different colors on the map.
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1) +
tm_shape(employers) +
tm_dots(col = "red") +
tm_shape(apartments) +
tm_dots(col = "lightblue") +
tm_shape(pubs) +
tm_dots(col = "green") +
tm_shape(restaurants) +
tm_dots(col = "blue") +
tm_shape(schools) +
tm_dots(col = "yellow")
tmap_mode("plot")
In order to find out the busiest location, we will create a hexagon binning map by using R.
In the code chunk below, we will use st_make_grid() to create haxegons.
hex <- st_make_grid(buildings,
cellsize=100,
square=FALSE) %>%
st_sf() %>%
rowid_to_column('hex_id')
plot(hex)
We selected a 15 days record of participants’ activities to identify the location with the most population.
write_rds(logs_selected,"Data/rds/logs_selected.rds")
logs_selected <-read_rds("data/rds/logs_selected.rds")
The code chunk below perform point in polygon overlay by using st_join().
points_in_hex <- st_join(logs_selected,
hex,
join=st_within)
#plot(points_in_hex, pch='.')
st_join() is used to count the number of event points in the hexagons.
points_in_hex <- st_join(logs_selected,
hex,
join=st_within) %>%
st_set_geometry(NULL) %>%
count(name='pointCount', hex_id)
head(points_in_hex)
# A tibble: 6 x 2
hex_id pointCount
<int> <int>
1 169 35
2 212 56
3 225 21
4 226 94
5 227 22
6 228 45
Here were join these two tables by hex_id as the join ID.
From the map we could tell that the arear with darker color is busier.
tm_shape(hex_combined %>%
filter(pointCount > 0))+
tm_fill("pointCount",
n = 8,
style = "quantile") +
tm_borders(alpha = 0.1)