Configure your environment and select any Indonesian city. Every code example, dataset download, and analysis in this course will adapt to your choice.
All tutorials, coordinates, EPSG codes, and data downloads will automatically adapt.
Everything you need in one ZIP — pre-configured for your selected city. Includes 37 Jupyter Notebooks, 37 source scripts, SQL files, requirements, and a ready-to-use city_config.json.
city_config.json for Semarang, requirements.txt, and README. Just unzip, install, and run.city_config.json for Semarang — swap this file to re-run the entire pipeline for a different city.pip install -r requirements.txt inside your conda environment.unzip urban_analytics_semarang.zip
cd urban_analytics
conda create -n urban-geo python=3.11 -y && conda activate urban-geo
pip install -r requirements.txt
jupyter lab
The ZIP includes these notebooks and scripts:
Each module produces artifacts the next module consumes. By the end, you'll have a live dashboard integrating every layer.
Indonesian cities are growing faster than the tools planners use to understand them. Manual GIS workflows can't keep pace with the scale of land-use change, transport demand, and coastal risk across Java, Sulawesi, Kalimantan, and Sumatra.
This course teaches you to build an automated, reproducible urban analytics pipeline — from raw open data to an interactive dashboard — using only free, open-source tools. Pick any city; the same code works everywhere.
Every module is organized into four modes, based on the Diátaxis framework:
Download and install Miniconda from the official site. This gives you the conda package manager without bloat.
# Linux / macOS
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Or macOS ARM (Apple Silicon)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
bash Miniconda3-latest-MacOSX-arm64.shconda create -n urban-geo python=3.11 -y
conda activate urban-geopip install pandas geopandas matplotlib seaborn shapely fiona pyproj \
osmnx networkx h3 requests beautifulsoup4 \
psycopg2-binary sqlalchemy geoalchemy2 \
earthengine-api geemap \
streamlit pydeck plotly foliumSuccessfully installed 50+ packages. No errors.
PostGIS needs PostgreSQL as a base. Install both:
# Ubuntu/Debian
sudo apt install postgresql postgis -y
# macOS (Homebrew)
brew install postgresql@16 postgis
# Start the service
sudo systemctl start postgresql # Linux
brew services start postgresql@16 # macOSsudo -u postgres createdb urban_semarang
sudo -u postgres psql -d urban_semarang -c "CREATE EXTENSION postgis; CREATE EXTENSION h3;"CREATE EXTENSION CREATE EXTENSION
If h3 extension is not available, install h3-pg from source or via pgxn install h3. The h3 extension is optional — you can compute H3 in Python instead.
mkdir -p urban_analytics/{data/raw,data/processed,output,notebooks,scripts}
cd urban_analyticsurban_analytics/ ├── data/ │ ├── raw/ # Downloaded shapefiles, CSVs, GeoJSON │ └── processed/ # Cleaned, reprojected, H3-indexed ├── output/ # GeoPackage, GeoTIFF, exports ├── notebooks/ # Jupyter notebooks (one per module) └── scripts/ # Reusable .py and .sql files
earthengine authenticateFollow the browser prompt to authorize. This is needed for Modules 6-7.
import pandas as pd
import geopandas as gpd
import osmnx as ox
import h3
import ee
ee.Initialize()
print("pandas", pd.__version__)
print("geopandas", gpd.__version__)
print("osmnx", ox.__version__)
print("h3", h3.__version__)
print("GEE OK")
print("\n✓ All systems go. Ready for Module 1.")pandas 2.x.x geopandas 1.x.x osmnx 2.x.x h3 4.x.x GEE OK ✓ All systems go. Ready for Module 1.
urban_analytics/ folder structure • urban_semarang PostGIS database • urban-geo conda environment
These values update automatically when you change the city selector above.
| Parameter | Value |
|---|---|
| City Name | Semarang |
| Province | Jawa Tengah |
| OSMnx Query | Semarang, Indonesia |
| Center Latitude | -6.97 |
| Center Longitude | 110.42 |
| UTM EPSG | EPSG:32749 |
| UTM Zone | 49S |
| BPS Kota Code | 3374 |
| Coastal | Yes |
| Bounding Box | [110.28, -7.12, 110.53, -6.88] |
| Source | Data Types | Coverage | Access |
|---|---|---|---|
| OpenStreetMap / Geofabrik | Roads, buildings, POIs, landuse | All cities | Free, bulk download or API |
| BPS (bps.go.id) | Population, economics, housing census | All kabupaten/kota | Free, web + API |
| InaGeoportal | Admin boundaries, RDTR zoning, land parcels | Varies by city | Free, download |
| Google Earth Engine | Landsat, Sentinel-2, SRTM DEM, land cover | Global | Free (research/education) |
| HERE Traffic API | Real-time traffic flow, incidents, routing | Major roads globally | Freemium (250k calls/mo free) |
| BMKG | Weather, climate, earthquake, tsunami | National | Free, web + API |
Core Python, pandas, and matplotlib — building the data-handling toolkit you'll use in every subsequent module.
Python is the glue language. It connects spatial libraries (GeoPandas), network analysis (NetworkX), databases (psycopg2), satellite imagery (GEE API), and dashboards (Streamlit) in a single workflow. No other language has this breadth.
In this course, Python isn't the destination — it's the vehicle. You'll learn just enough to drive it confidently through 7 modules of increasingly powerful geospatial tools.
.groupby(), .merge(), and .apply() and you can handle 90% of urban data tasks.urban-geo conda environment activeurban_analytics/ folder structure createdOpen JupyterLab and create notebooks/01_python_foundations.ipynb. This notebook will produce a reusable city config that every future module imports.
# Cell 1 — City Configuration
# Change this ONE dict to switch the entire pipeline to a new city
CITY = {
"name": "Semarang",
"province": "Jawa Tengah",
"osm_query": "Semarang, Indonesia",
"lat": -6.97,
"lon": 110.42,
"epsg_utm": 32749,
"bps_code": "3374",
"coastal": True,
"bbox": [110.28, -7.12, 110.53, -6.88],
}
print(f"Working with: {CITY['name']}, {CITY['province']}")Working with: Semarang, Jawa Tengah
Build a DataFrame of sample districts. We'll replace this with real BPS data in Module 2.
# Cell 2 — Practice DataFrames
import pandas as pd
# Sample district data — we'll fetch real data in Module 2
districts = pd.DataFrame({
"kecamatan": ["Kec A", "Kec B", "Kec C", "Kec D", "Kec E"],
"population": [85000, 120000, 65000, 95000, 110000],
"area_km2": [12.5, 8.3, 22.1, 15.6, 9.8],
"elevation_m": [3, 85, 210, 45, 5],
})
# Derived columns
districts["density"] = (districts["population"] / districts["area_km2"]).round(0)
districts["flood_risk"] = districts["elevation_m"].apply(
lambda e: "high" if e < 5 else ("medium" if e < 20 else "low")
)
print(districts.to_string(index=False))kecamatan population area_km2 elevation_m density flood_risk
Kec A 85000 12.5 3 6800.0 high
Kec B 120000 8.3 85 14458.0 low
Kec C 65000 22.1 210 2941.0 low
Kec D 95000 15.6 45 6090.0 low
Kec E 110000 9.8 5 11224.0 medium# Cell 3 — Summary by risk category
summary = districts.groupby("flood_risk").agg(
n_kecamatan=("kecamatan", "count"),
total_pop=("population", "sum"),
avg_density=("density", "mean"),
).round(0)
print(summary)# Cell 4 — Functions we'll reuse throughout the course
import math
def haversine(lat1, lon1, lat2, lon2):
"""Distance in km between two GPS points."""
R = 6371
dlat = math.radians(lat2 - lat1)
dlon = math.radians(lon2 - lon1)
a = math.sin(dlat/2)**2 + math.cos(math.radians(lat1)) * \
math.cos(math.radians(lat2)) * math.sin(dlon/2)**2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
def calculate_kdb(footprint_m2, land_area_m2):
"""KDB = building footprint / land parcel area."""
return round(footprint_m2 / land_area_m2, 3) if land_area_m2 > 0 else 0
def classify_density(pop_per_km2):
"""Indonesian urban density classification."""
if pop_per_km2 > 15000: return "very_high"
if pop_per_km2 > 10000: return "high"
if pop_per_km2 > 5000: return "medium"
return "low"
# Test
print(f"Distance city center to bbox corner: "
f"{haversine(CITY['lat'], CITY['lon'], CITY['bbox'][1], CITY['bbox'][0]):.1f} km")
print(f"KDB of 250m² on 400m² lot: {calculate_kdb(250, 400)}")
print(f"14,000 pop/km²: {classify_density(14000)}")Export the CITY dict as JSON so every future notebook can import it.
# Cell 5 — Save config
import json
from pathlib import Path
config_path = Path("../data/city_config.json")
config_path.write_text(json.dumps(CITY, indent=2, ensure_ascii=False))
print(f"Saved to {config_path.resolve()}")data/city_config.json • notebooks/01_python_foundations.ipynb • Helper functions: haversine(), calculate_kdb(), classify_density()
df = pd.read_csv("data.csv", encoding="utf-8", sep=";") # BPS often uses semicolonshigh_risk = df[(df["elevation_m"] < 5) & (df["population"] > 50000)]pivot = pd.pivot_table(df, values="population", index="kecamatan",
columns="year", aggfunc="sum")import matplotlib.pyplot as plt
districts.sort_values("density").plot.barh(x="kecamatan", y="density", color="#2A7B9B")
plt.xlabel("Population Density (per km²)")
plt.title(f"Population Density by Kecamatan — {CITY['name']}")
plt.tight_layout()
plt.savefig("../output/density_chart.png", dpi=150)
plt.show()| Operation | Code | Returns |
|---|---|---|
| Load CSV | pd.read_csv(path) | DataFrame |
| Shape | df.shape | (rows, cols) tuple |
| Column types | df.dtypes | Series of dtypes |
| Summary stats | df.describe() | count, mean, std, min, max |
| Filter rows | df[df["col"] > val] | Filtered DataFrame |
| New column | df["new"] = expr | Modified in place |
| Group + aggregate | df.groupby("col").agg(...) | Aggregated DataFrame |
| Merge tables | pd.merge(a, b, on="key") | Joined DataFrame |
| Apply function | df["col"].apply(fn) | Transformed Series |
| Sort | df.sort_values("col") | Sorted DataFrame |
| Drop NaN | df.dropna(subset=["col"]) | Cleaned DataFrame |
| Save CSV | df.to_csv(path, index=False) | File on disk |
df["density"] = df["population"] / df["area_km2"] do that a for-loop doesn't?Download real open data — administrative boundaries, buildings, roads, and census statistics — for Semarang using APIs, web scraping, and bulk downloads.
Indonesia has rich — but scattered — geospatial data. The trick is knowing where to look and how to extract it programmatically so your pipeline stays reproducible.
data/city_config.json existsurban-geo environment activeCreate notebooks/02_data_acquisition.ipynb.
import json, requests
import pandas as pd
import geopandas as gpd
from pathlib import Path
CITY = json.loads(Path("../data/city_config.json").read_text())
print(f"Acquiring data for: {CITY['name']}")OSMnx can download the boundary polygon of any named place in OpenStreetMap.
import osmnx as ox
# Download city boundary
boundary = ox.geocode_to_gdf(CITY["osm_query"])
boundary.to_file("../data/raw/admin_boundary.geojson", driver="GeoJSON")
print(f"Boundary area: {boundary.to_crs(epsg=CITY['epsg_utm']).area.iloc[0] / 1e6:.1f} km²")Boundary area: ~373.7 km² (varies by city)
The Overpass API lets you query OpenStreetMap for any feature type. We fetch all building footprints within the city's bounding box.
# Download building footprints via OSMnx
tags = {"building": True}
buildings = ox.features_from_place(CITY["osm_query"], tags=tags)
# Keep only polygons (discard nodes)
buildings = buildings[buildings.geometry.type.isin(["Polygon", "MultiPolygon"])].copy()
# Calculate footprint area in m²
buildings_utm = buildings.to_crs(epsg=CITY["epsg_utm"])
buildings["footprint_m2"] = buildings_utm.geometry.area.round(1)
# Save
cols = ["name", "building", "building:levels", "footprint_m2", "geometry"]
cols = [c for c in cols if c in buildings.columns]
buildings[cols].to_file("../data/raw/buildings.geojson", driver="GeoJSON")
print(f"Downloaded {len(buildings):,} building footprints")Downloaded 45,000–200,000 building footprints (varies by city OSM coverage)
For Jakarta or Surabaya, this may take several minutes and produce large files. Consider filtering by a smaller area first (pass bbox instead of place name).
# Street network — we'll analyze this deeply in Module 4
G = ox.graph_from_place(CITY["osm_query"], network_type="drive")
ox.save_graphml(G, "../data/raw/street_network.graphml")
nodes, edges = ox.graph_to_gdfs(G)
print(f"Network: {len(nodes):,} intersections, {len(edges):,} road segments")BPS provides an API for statistical tables. We fetch population by kecamatan.
# BPS API — population by sub-district
# Note: you need a free API key from webapi.bps.go.id
BPS_KEY = "YOUR_BPS_API_KEY" # Register at webapi.bps.go.id
# Alternative: scrape the BPS website table
url = f"https://semarangkota.bps.go.id/indicator/12/71/1/jumlah-penduduk.html"
try:
tables = pd.read_html(url)
population = tables[0]
population.to_csv("../data/raw/population_bps.csv", index=False)
print(f"Scraped {len(population)} rows from BPS")
except Exception as e:
print(f"BPS scraping failed: {e}")
print("Tip: manually download from the BPS website and save as CSV")BPS website structure varies by city. If scraping fails, visit semarangkota.bps.go.id manually, find the population table, and save as CSV to data/raw/population_bps.csv. The pipeline continues either way.
# Key urban facilities
poi_tags = {
"amenity": ["hospital", "school", "marketplace"],
"shop": True,
}
pois = ox.features_from_place(CITY["osm_query"], tags=poi_tags)
pois_points = pois.copy()
pois_points["geometry"] = pois_points.geometry.centroid
pois_points[["name", "amenity", "shop", "geometry"]].dropna(
subset=["name"]
).to_file("../data/raw/pois.geojson", driver="GeoJSON")
print(f"Downloaded {len(pois_points):,} POIs")# Quick inventory check
raw = Path("../data/raw")
for f in sorted(raw.glob("*")):
size = f.stat().st_size / 1024
unit = "KB"
if size > 1024:
size /= 1024; unit = "MB"
print(f" {f.name:<35} {size:>8.1f} {unit}")admin_boundary.geojson 45.2 KB buildings.geojson 12.8 MB pois.geojson 285.3 KB population_bps.csv 3.2 KB street_network.graphml 8.5 MB
data/raw/admin_boundary.geojson • data/raw/buildings.geojson • data/raw/street_network.graphml • data/raw/pois.geojson • data/raw/population_bps.csv
buildings = ox.features_from_bbox(
bbox=(north, south, east, west), # use CITY["bbox"] values
tags={"building": True}
)query = f"""
[out:json][timeout:120];
area["name"="{CITY['name']}"]["admin_level"="5"]->.city;
(way["building"="commercial"](area.city););
out body; >; out skel qt;
"""
response = requests.get("https://overpass-api.de/api/interpreter",
params={"data": query})wget https://download.geofabrik.de/asia/indonesia-latest-free.shp.zip
unzip indonesia-latest-free.shp.zip -d data/raw/geofabrik/| Source | Function / URL | Returns | Coverage |
|---|---|---|---|
| OSMnx geocode | ox.geocode_to_gdf(query) | City boundary GeoDataFrame | All cities |
| OSMnx features | ox.features_from_place(query, tags) | GeoDataFrame of tagged features | All cities |
| OSMnx graph | ox.graph_from_place(query) | NetworkX MultiDiGraph | All cities |
| BPS API | webapi.bps.go.id/v1/api/list | JSON statistical tables | All kabupaten/kota |
| InaGeoportal | tanahair.indonesia.go.id | SHP / GeoJSON download | Varies |
| Geofabrik | download.geofabrik.de/asia/indonesia | SHP / PBF bulk extract | All Indonesia |
| SRTM DEM | via GEE: USGS/SRTMGL1_003 | 30m elevation raster | Global |
GeoPandas spatial analysis, KDB/KLB zoning compliance, and Uber H3 hexagonal grids — turning raw downloads into structured urban layers for Semarang.
A GeoDataFrame is a pandas DataFrame with a geometry column. Every row has both tabular attributes (name, area, population) and a spatial shape (point, line, polygon). This means you can filter, group, and merge spatially — not just by column values.
Uber's H3 divides Earth into hexagons at 16 resolution levels. Assign every dataset an H3 cell ID and you can join anything to anything — no complex spatial intersection needed.
| Resolution | Avg Hex Area | Best For |
|---|---|---|
7 | ~5.16 km² | District-level summaries |
8 | ~0.74 km² | Neighborhood / OD flows |
9 | ~0.11 km² | Block-level traffic analysis |
10 | ~0.015 km² | Individual parcel scale |
data/raw/buildings.geojson and data/raw/admin_boundary.geojson existdata/city_config.json availableCreate notebooks/03_geopandas_h3.ipynb.
import json
import geopandas as gpd
import pandas as pd
import h3
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
from pathlib import Path
CITY = json.loads(Path("../data/city_config.json").read_text())
# Load Module 2 outputs
buildings = gpd.read_file("../data/raw/buildings.geojson")
boundary = gpd.read_file("../data/raw/admin_boundary.geojson")
print(f"Buildings: {len(buildings):,} features, CRS: {buildings.crs}")
print(f"Boundary CRS: {boundary.crs}")# Reproject to UTM for metric calculations
buildings_utm = buildings.to_crs(epsg=CITY["epsg_utm"])
# Building footprint area (m²)
buildings["footprint_m2"] = buildings_utm.geometry.area.round(1)
# Estimate floors if available from OSM tags
if "building:levels" in buildings.columns:
buildings["floors"] = pd.to_numeric(
buildings["building:levels"], errors="coerce"
).fillna(1).astype(int)
else:
buildings["floors"] = 1
buildings["floor_area_m2"] = buildings["footprint_m2"] * buildings["floors"]
print(buildings[["footprint_m2", "floors", "floor_area_m2"]].describe().round(1))# Convert building centroids to H3 cells at resolution 9
H3_RES = 9
centroids = buildings.geometry.centroid.to_crs(epsg=4326)
buildings["h3_cell"] = [
h3.latlng_to_cell(pt.y, pt.x, H3_RES) for pt in centroids
]
print(f"Unique H3 cells: {buildings['h3_cell'].nunique():,}")
print(f"Sample cell: {buildings['h3_cell'].iloc[0]}")Unique H3 cells: ~1,200–3,000 Sample cell: 89600c1a2ffffff
# Aggregate per hexagon
h3_stats = buildings.groupby("h3_cell").agg(
building_count=("footprint_m2", "count"),
total_footprint=("footprint_m2", "sum"),
avg_footprint=("footprint_m2", "mean"),
max_floors=("floors", "max"),
total_floor_area=("floor_area_m2", "sum"),
).round(1)
# Create H3 hexagon geometries for mapping
def h3_to_polygon(cell_id):
boundary = h3.cell_to_boundary(cell_id)
return Polygon([(lng, lat) for lat, lng in boundary])
h3_grid = gpd.GeoDataFrame(
h3_stats,
geometry=[h3_to_polygon(c) for c in h3_stats.index],
crs="EPSG:4326"
)
h3_grid.index.name = "h3_cell"
print(f"H3 grid: {len(h3_grid)} hexagons")
print(h3_grid.head())Without official parcel boundaries, we approximate KDB at the H3 cell level: total building footprint / hexagon area.
# H3 res-9 hex area in m² (approximate)
HEX_AREA_M2 = 105000 # ~0.105 km²
h3_grid["kdb_approx"] = (h3_grid["total_footprint"] / HEX_AREA_M2).round(3)
h3_grid["density_class"] = pd.cut(
h3_grid["kdb_approx"],
bins=[0, 0.1, 0.3, 0.5, 1.0, float("inf")],
labels=["sparse", "low", "medium", "high", "very_high"]
)
print(h3_grid["density_class"].value_counts())# Map: building density by H3 cell
fig, ax = plt.subplots(figsize=(12, 10))
boundary.to_crs(epsg=4326).boundary.plot(ax=ax, color="black", linewidth=1)
h3_grid.plot(column="kdb_approx", cmap="YlOrRd", legend=True,
ax=ax, alpha=0.7, edgecolor="grey", linewidth=0.2)
ax.set_title(f"Building Density (KDB approx) — {CITY['name']}")
plt.savefig("../output/h3_building_density.png", dpi=150, bbox_inches="tight")
plt.show()
# Save processed data
buildings.to_file("../data/processed/buildings_processed.gpkg", driver="GPKG")
h3_grid.reset_index().to_file("../data/processed/h3_grid.gpkg", driver="GPKG")
print("Saved: buildings_processed.gpkg, h3_grid.gpkg")data/processed/buildings_processed.gpkg • data/processed/h3_grid.gpkg • output/h3_building_density.png
gdf_utm = gdf.to_crs(epsg=CITY["epsg_utm"]) # meters for area/distancejoined = gpd.sjoin(buildings, zoning, how="left", predicate="within")coast_buffer = coastline.to_crs(epsg=CITY["epsg_utm"]).buffer(500) # 500m# Average metric over a cell and its 6 neighbors
def kring_smooth(row, df, col, k=1):
neighbors = h3.grid_disk(row.name, k)
vals = df.loc[df.index.isin(neighbors), col]
return vals.mean()
h3_grid["density_smooth"] = h3_grid.apply(
kring_smooth, axis=1, df=h3_grid, col="building_count"
)| Operation | Code |
|---|---|
| Read file | gpd.read_file("path.gpkg") |
| Reproject | gdf.to_crs(epsg=32749) |
| Area (m²) | gdf_utm.geometry.area |
| Centroid | gdf.geometry.centroid |
| Buffer | gdf_utm.buffer(500) (meters) |
| Spatial join | gpd.sjoin(left, right, predicate="within") |
| Overlay | gpd.overlay(a, b, how="intersection") |
| Save GeoPackage | gdf.to_file("out.gpkg", driver="GPKG") |
| H3: point to cell | h3.latlng_to_cell(lat, lon, res) |
| H3: cell boundary | h3.cell_to_boundary(cell_id) |
| H3: neighbors | h3.grid_disk(cell_id, k) |
| H3: resolution | h3.get_resolution(cell_id) |
OSMnx and NetworkX turn street maps into mathematical graphs — revealing chokepoints, accessibility gaps, and optimal facility locations in Semarang.
Think of your city's street network like a social network. Intersections are people (nodes). Roads are friendships (edges). Some intersections are on the shortest path between everyone — they're the bottlenecks. Others sit in dead-end corners.
data/raw/street_network.graphml existsdata/processed/h3_grid.gpkg existsCreate notebooks/04_network_analysis.ipynb.
import json
import osmnx as ox
import networkx as nx
import geopandas as gpd
import matplotlib.pyplot as plt
from pathlib import Path
CITY = json.loads(Path("../data/city_config.json").read_text())
# Load the graph saved in Module 2
G = ox.load_graphml("../data/raw/street_network.graphml")
G_undir = ox.convert.to_undirected(G)
nodes, edges = ox.graph_to_gdfs(G_undir)
print(f"Network: {len(nodes):,} nodes, {len(edges):,} edges")This takes 2-10 minutes depending on city size. For large networks, sample a subgraph first.
# Betweenness centrality (may take several minutes for large cities)
print("Calculating betweenness centrality...")
bc = nx.betweenness_centrality(G_undir, weight="length")
print("Calculating closeness centrality...")
cc = nx.closeness_centrality(G_undir, distance="length")
# Attach to nodes GeoDataFrame
nodes["betweenness"] = nodes.index.map(bc)
nodes["closeness"] = nodes.index.map(cc)
print(f"Top betweenness: {nodes['betweenness'].max():.6f}")
print(f"Top closeness: {nodes['closeness'].max():.6f}")For 50k+ nodes, use nx.betweenness_centrality(G, k=500) to sample 500 random source nodes. This gives a good approximation in seconds instead of hours.
An isochrone shows all locations reachable within a time limit from a starting point. We'll generate 5, 10, and 15-minute walking isochrones from the city center.
# Walk network for isochrones
G_walk = ox.graph_from_place(CITY["osm_query"], network_type="walk")
# Find nearest node to city center
center_node = ox.nearest_nodes(G_walk, CITY["lon"], CITY["lat"])
# Walking speed ~ 4.5 km/h = 1.25 m/s
WALK_SPEED = 1.25
for u, v, data in G_walk.edges(data=True):
data["time"] = data["length"] / WALK_SPEED
# Isochrone polygons
isochrones = []
for minutes in [5, 10, 15]:
subgraph = nx.ego_graph(G_walk, center_node,
radius=minutes * 60, distance="time")
iso_nodes = ox.graph_to_gdfs(subgraph, edges=False)
hull = iso_nodes.unary_union.convex_hull
isochrones.append({"minutes": minutes, "geometry": hull})
iso_gdf = gpd.GeoDataFrame(isochrones, crs="EPSG:4326")
iso_gdf.to_file("../data/processed/isochrones.gpkg", driver="GPKG")
print(f"Generated {len(iso_gdf)} isochrones")# Betweenness centrality map
fig, ax = plt.subplots(figsize=(14, 10))
edges_gdf = ox.graph_to_gdfs(G_undir, nodes=False)
edges_gdf.plot(ax=ax, color="#ddd", linewidth=0.3)
nodes.plot(ax=ax, column="betweenness", cmap="hot_r",
markersize=nodes["betweenness"] * 50000,
alpha=0.7, legend=True)
ax.set_title(f"Betweenness Centrality — {CITY['name']}")
ax.set_axis_off()
plt.savefig("../output/betweenness_map.png", dpi=150, bbox_inches="tight")
plt.show()
# Save centrality nodes
nodes.to_file("../data/processed/centrality_nodes.gpkg", driver="GPKG")
print("Saved: centrality_nodes.gpkg")data/processed/centrality_nodes.gpkg • data/processed/isochrones.gpkg • output/betweenness_map.png
orig = ox.nearest_nodes(G, lon1, lat1)
dest = ox.nearest_nodes(G, lon2, lat2)
route = ox.shortest_path(G, orig, dest, weight="length")
ox.plot_graph_route(G, route)stats = ox.basic_stats(G, area=boundary_area_m2)
print(f"Street density: {stats['street_density_km']:.1f} km/km²")
print(f"Avg edge length: {stats['street_length_avg']:.0f} m")bc_approx = nx.betweenness_centrality(G, k=500, weight="length") # fast approximation| Function | Purpose |
|---|---|
ox.graph_from_place(q) | Download street graph for a named place |
ox.graph_from_bbox(n,s,e,w) | Download by bounding box |
ox.save_graphml(G, path) | Save graph to disk |
ox.load_graphml(path) | Reload saved graph |
ox.graph_to_gdfs(G) | Convert to GeoDataFrames (nodes, edges) |
ox.nearest_nodes(G, lon, lat) | Find nearest node to a coordinate |
ox.shortest_path(G, o, d) | Find shortest path between two nodes |
nx.betweenness_centrality(G) | How often a node lies on shortest paths |
nx.closeness_centrality(G) | Average inverse distance to all other nodes |
nx.ego_graph(G, n, radius=r) | Subgraph within radius of node n |
PostGIS turns PostgreSQL into a spatial database — permanent storage for millions of records with lightning-fast spatial queries via SQL.
GeoPandas loads everything into RAM. Fine for 50,000 buildings. But when you have 400,000 buildings across 30 years, multiple team members, and queries like "find every building within 500m of the coast that violates KDB" — you need a database.
When a city doubles in population, some metrics scale sublinearly (infrastructure — economies of scale), linearly (housing), or superlinearly (innovation, congestion). The exponent β reveals the pattern.
urban_semarang createddata/processed/ogr2ogr (part of GDAL) bulk-loads spatial files into PostGIS tables.
# Load buildings
ogr2ogr -f "PostgreSQL" \
PG:"dbname=urban_semarang user=postgres" \
data/processed/buildings_processed.gpkg \
-nln buildings -overwrite -lco GEOMETRY_NAME=geom
# Load H3 grid
ogr2ogr -f "PostgreSQL" \
PG:"dbname=urban_semarang user=postgres" \
data/processed/h3_grid.gpkg \
-nln h3_grid -overwrite -lco GEOMETRY_NAME=geom
# Load centrality nodes
ogr2ogr -f "PostgreSQL" \
PG:"dbname=urban_semarang user=postgres" \
data/processed/centrality_nodes.gpkg \
-nln centrality_nodes -overwrite -lco GEOMETRY_NAME=geom
# Load admin boundary
ogr2ogr -f "PostgreSQL" \
PG:"dbname=urban_semarang user=postgres" \
data/raw/admin_boundary.geojson \
-nln admin_boundary -overwrite -lco GEOMETRY_NAME=geom-- Connect: psql -d urban_semarang
CREATE INDEX IF NOT EXISTS idx_buildings_geom ON buildings USING GIST(geom);
CREATE INDEX IF NOT EXISTS idx_h3_grid_geom ON h3_grid USING GIST(geom);
CREATE INDEX IF NOT EXISTS idx_centrality_geom ON centrality_nodes USING GIST(geom);
-- Verify tables loaded
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::text))
FROM pg_tables
WHERE schemaname = 'public';tablename | pg_size_pretty ------------------+---------------- buildings | 45 MB h3_grid | 512 KB centrality_nodes | 8 MB admin_boundary | 64 KB
-- Find buildings within 500m of a coastal point
SELECT count(*) AS coastal_buildings,
round(avg(footprint_m2)::numeric, 1) AS avg_footprint
FROM buildings
WHERE ST_DWithin(
geom::geography,
ST_SetSRID(ST_MakePoint(110.42, -6.97), 4326)::geography,
500 -- meters
);-- Join buildings to H3 hexagons and aggregate
CREATE MATERIALIZED VIEW h3_building_summary AS
SELECT
h.h3_cell,
h.geom,
count(b.*) AS building_count,
coalesce(sum(b.footprint_m2), 0) AS total_footprint,
coalesce(avg(b.footprint_m2), 0)::numeric(10,1) AS avg_footprint
FROM h3_grid h
LEFT JOIN buildings b ON ST_Contains(h.geom, b.geom)
GROUP BY h.h3_cell, h.geom;
CREATE INDEX idx_h3_summary_geom ON h3_building_summary USING GIST(geom);
-- Check result
SELECT count(*) AS hexagons,
sum(building_count) AS total_buildings
FROM h3_building_summary;Create notebooks/05_postgis.ipynb and test querying from Python.
import geopandas as gpd
from sqlalchemy import create_engine
engine = create_engine("postgresql://postgres@localhost/urban_semarang")
# Load H3 building summary as GeoDataFrame
h3_summary = gpd.read_postgis(
"SELECT * FROM h3_building_summary",
engine, geom_col="geom"
)
print(f"Loaded {len(h3_summary)} hexagons from PostGIS")
print(h3_summary[["h3_cell", "building_count", "total_footprint"]].head())PostGIS database urban_semarang with tables: buildings, h3_grid, centrality_nodes, admin_boundary • Materialized view: h3_building_summary
SELECT * FROM buildings WHERE ST_Within(geom, (SELECT geom FROM zoning WHERE zone_code = 'K-1'));SELECT count(*) FROM buildings
WHERE ST_DWithin(geom::geography,
(SELECT ST_Boundary(geom)::geography FROM admin_boundary LIMIT 1), 1000);REFRESH MATERIALIZED VIEW h3_building_summary;SELECT REGR_SLOPE(ln(built_area), ln(population)) AS beta
FROM city_stats
WHERE population > 100000;| Function | Purpose |
|---|---|
ST_Contains(a, b) | True if geometry a fully contains b |
ST_Within(a, b) | True if a is fully within b |
ST_Intersects(a, b) | True if a and b share any space |
ST_DWithin(a, b, d) | True if a and b are within distance d |
ST_Buffer(geom, d) | Create buffer zone of distance d around geometry |
ST_Area(geom) | Area in CRS units (use geography for m²) |
ST_Distance(a, b) | Distance between geometries |
ST_Transform(geom, srid) | Reproject to a different CRS |
ST_SetSRID(geom, srid) | Assign CRS without transforming coordinates |
ST_MakePoint(lon, lat) | Create point from coordinates |
ST_Contains(z.geom, b.geom) check?Google Earth Engine processes petabytes of satellite imagery in the cloud — land-use classification, change detection, and coastal risk modeling for Semarang.
GEE gives you access to decades of Landsat and Sentinel imagery, processed on Google's servers. The course uses it to answer two questions: "How has land use changed?" and "Which areas face the highest flood risk?"
Toggle risk dimensions to see how the composite score changes.
Simulated score for a low-elevation coastal kelurahan
earthengine authenticate completeddata/raw/admin_boundary.geojson for city extentCreate notebooks/06_earth_engine.ipynb.
import json, ee
import geemap
from pathlib import Path
ee.Initialize()
CITY = json.loads(Path("../data/city_config.json").read_text())
# Define study area from bounding box
bbox = CITY["bbox"]
aoi = ee.Geometry.Rectangle([bbox[0], bbox[1], bbox[2], bbox[3]])
print(f"Study area: {CITY['name']}")
print(f"Bounding box: {bbox}")def get_composite(year, aoi):
"""Cloud-masked median composite from Landsat 8/9."""
collection = (ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
.merge(ee.ImageCollection("LANDSAT/LC09/C02/T1_L2"))
.filterBounds(aoi)
.filterDate(f"{year}-01-01", f"{year}-12-31")
.filter(ee.Filter.lt("CLOUD_COVER", 20)))
def mask_clouds(img):
qa = img.select("QA_PIXEL")
mask = qa.bitwiseAnd(1 << 3).eq(0).And(qa.bitwiseAnd(1 << 4).eq(0))
return img.updateMask(mask)
return collection.map(mask_clouds).median().clip(aoi)
composite_2015 = get_composite(2015, aoi)
composite_2023 = get_composite(2023, aoi)
print("Composites ready")def add_indices(img):
"""Add NDVI, NDBI, MNDWI to a Landsat image."""
ndvi = img.normalizedDifference(["SR_B5", "SR_B4"]).rename("NDVI")
ndbi = img.normalizedDifference(["SR_B6", "SR_B5"]).rename("NDBI")
mndwi = img.normalizedDifference(["SR_B3", "SR_B6"]).rename("MNDWI")
return img.addBands([ndvi, ndbi, mndwi])
img_2015 = add_indices(composite_2015)
img_2023 = add_indices(composite_2023)
# Visualize NDVI
Map = geemap.Map(center=[CITY["lat"], CITY["lon"]], zoom=11)
Map.addLayer(img_2023.select("NDVI"), {"min": -0.2, "max": 0.8, "palette": ["red", "yellow", "green"]}, "NDVI 2023")
Map# Training points (replace with your own labeled samples)
# Classes: 0=water, 1=vegetation, 2=built-up, 3=bare_soil
bands = ["SR_B2","SR_B3","SR_B4","SR_B5","SR_B6","SR_B7","NDVI","NDBI","MNDWI"]
# Use a GEE training dataset or create your own FeatureCollection
# For reproducibility, use ESA WorldCover as pseudo-labels
worldcover = ee.Image("ESA/WorldCover/v200").clip(aoi)
# Sample training points from WorldCover
training = img_2023.select(bands).addBands(worldcover.rename("label")).sample(
region=aoi, scale=30, numPixels=5000, seed=42
)
# Train Random Forest
classifier = ee.Classifier.smileRandomForest(50).train(
features=training, classProperty="label", inputProperties=bands
)
# Classify both years
classified_2015 = img_2015.select(bands).classify(classifier)
classified_2023 = img_2023.select(bands).classify(classifier)
print("Classification complete")# Elevation from SRTM
dem = ee.Image("USGS/SRTMGL1_003").clip(aoi)
elevation = dem.select("elevation")
slope = ee.Terrain.slope(dem)
# Normalize to 0-1 (higher value = more hazardous)
h_elev = elevation.multiply(-1).add(50).divide(50).clamp(0, 1) # low elevation = high risk
h_slope = slope.multiply(-1).add(30).divide(30).clamp(0, 1)
# Built-up density as exposure proxy
ndbi_norm = img_2023.select("NDBI").add(1).divide(2).clamp(0, 1)
# Green cover as vulnerability proxy (less green = more vulnerable)
ndvi_inv = img_2023.select("NDVI").multiply(-1).add(1).divide(2).clamp(0, 1)
# Composite
risk = (h_elev.multiply(0.40)
.add(ndbi_norm.multiply(0.35))
.add(ndvi_inv.multiply(0.25)))
# Visualize
Map = geemap.Map(center=[CITY["lat"], CITY["lon"]], zoom=11)
Map.addLayer(risk, {"min": 0, "max": 1, "palette": ["green", "yellow", "red"]}, "Risk")
Map# Export risk raster to Google Drive (then download to output/)
task = ee.batch.Export.image.toDrive(
image=risk,
description=f"risk_composite_{CITY['name'].lower()}",
folder="urban_analytics",
region=aoi,
scale=30,
maxPixels=1e9
)
task.start()
print(f"Export started. Check Google Drive > urban_analytics/")
# Also export the change detection
change = classified_2023.subtract(classified_2015).rename("change")
task2 = ee.batch.Export.image.toDrive(
image=change, description=f"land_change_{CITY['name'].lower()}",
folder="urban_analytics", region=aoi, scale=30, maxPixels=1e9
)
task2.start()
print("Change detection export started")Download the GeoTIFFs from Google Drive to output/risk_composite.tif and output/land_change.tif. These feed into the dashboard in Module 7.
output/risk_composite.tif • output/land_change.tif (via Google Drive)
point = ee.Geometry.Point([CITY["lon"], CITY["lat"]])
ts = (ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
.filterBounds(point).select("SR_B5", "SR_B4")
.map(lambda img: img.normalizedDifference(["SR_B5", "SR_B4"]).rename("NDVI")
.set("date", img.date().format())))flood_level = 2 # meters above sea level
flooded = dem.select("elevation").lte(flood_level)
Map.addLayer(flooded.selfMask(), {"palette": ["blue"]}, f"Flood {flood_level}m")geemap.ee_export_image(risk, filename="../output/risk.tif", scale=30, region=aoi)| Index | Formula | Measures |
|---|---|---|
| NDVI | (NIR − Red) / (NIR + Red) | Vegetation density/health |
| NDBI | (SWIR1 − NIR) / (SWIR1 + NIR) | Built-up/impervious surface |
| MNDWI | (Green − SWIR1) / (Green + SWIR1) | Water bodies |
| Collection | ID | Resolution |
|---|---|---|
| Landsat 8/9 SR | LANDSAT/LC08/C02/T1_L2 | 30m, 16-day |
| Sentinel-2 SR | COPERNICUS/S2_SR_HARMONIZED | 10m, 5-day |
| SRTM DEM | USGS/SRTMGL1_003 | 30m, static |
| ESA WorldCover | ESA/WorldCover/v200 | 10m, 2021 |
Streamlit, pydeck, and HERE Traffic — assembling everything into an interactive digital twin of Semarang that decision-makers can explore.
A digital twin is a living, data-driven mirror of a physical system. The dashboard brings together everything from the previous 6 modules into one explorable interface.
st.map(), st.slider().urban_semarang with all tables loadedoutput/risk_composite.tif downloaded from GEEdata/processed/h3_grid.gpkgCreate scripts/dashboard.py.
# scripts/dashboard.py
import json
import streamlit as st
import pandas as pd
import geopandas as gpd
import pydeck as pdk
import h3
from sqlalchemy import create_engine
from pathlib import Path
# ─── Config ───
CITY = json.loads(Path("data/city_config.json").read_text())
DB_URL = f"postgresql://postgres@localhost/urban_{CITY['name'].lower()}"
engine = create_engine(DB_URL)
st.set_page_config(page_title=f"{CITY['name']} Urban Dashboard", layout="wide")
st.title(f"🏙️ {CITY['name']} Urban Analytics Dashboard")# ─── Data Loading (cached) ───
@st.cache_data
def load_h3_data():
df = gpd.read_postgis(
"SELECT * FROM h3_building_summary",
engine, geom_col="geom"
)
# Convert H3 geometry to lat/lon for pydeck
df["lat"] = df.geometry.centroid.y
df["lon"] = df.geometry.centroid.x
return df
h3_data = load_h3_data()
st.sidebar.metric("Total Hexagons", len(h3_data))
st.sidebar.metric("Total Buildings", f"{h3_data['building_count'].sum():,}")# ─── H3 Building Density Map ───
st.subheader("Building Density by H3 Cell")
metric = st.selectbox("Color by", ["building_count", "total_footprint", "avg_footprint"])
layer = pdk.Layer(
"H3HexagonLayer",
data=h3_data,
get_hexagon="h3_cell",
get_fill_color=f"[255, (1 - {metric} / {h3_data[metric].quantile(0.95)}) * 255, 0, 160]",
get_elevation=metric,
elevation_scale=2,
extruded=True,
pickable=True,
)
view = pdk.ViewState(
latitude=CITY["lat"], longitude=CITY["lon"],
zoom=11, pitch=45
)
st.pydeck_chart(pdk.Deck(layers=[layer], initial_view_state=view,
tooltip={"text": f"H3: {{h3_cell}}\\n{metric}: {{{metric}}}"}))# ─── HERE Traffic (optional — needs API key) ───
import requests
HERE_KEY = st.sidebar.text_input("HERE API Key", type="password")
if HERE_KEY:
bbox_str = f"{CITY['bbox'][1]},{CITY['bbox'][0]},{CITY['bbox'][3]},{CITY['bbox'][2]}"
url = f"https://data.traffic.hereapi.com/v7/flow?in=bbox:{bbox_str}&apiKey={HERE_KEY}"
@st.cache_data(ttl=300)
def fetch_traffic():
resp = requests.get(url)
if resp.status_code == 200:
results = resp.json().get("results", [])
rows = []
for r in results:
loc = r.get("location", {})
flow = r.get("currentFlow", {})
if "shape" in loc and "speed" in flow:
coords = loc["shape"]["links"][0]["points"]
mid = coords[len(coords)//2]
rows.append({
"lat": mid["lat"], "lon": mid["lng"],
"speed": flow["speed"],
"jam_factor": flow.get("jamFactor", 0),
})
return pd.DataFrame(rows)
return pd.DataFrame()
traffic = fetch_traffic()
if not traffic.empty:
st.subheader(f"Live Traffic — {len(traffic)} segments")
traffic["h3_cell"] = [h3.latlng_to_cell(r.lat, r.lon, 9) for _, r in traffic.iterrows()]
st.dataframe(traffic.head(10))cd urban_analytics
streamlit run scripts/dashboard.pyLocal URL: http://localhost:8501 Network URL: http://192.168.x.x:8501
Running Streamlit dashboard at localhost:8501 integrating PostGIS, H3, GEE risk maps, and (optionally) live HERE Traffic data for Semarang.
arc_layer = pdk.Layer("ArcLayer", data=od_df,
get_source_position=["origin_lon", "origin_lat"],
get_target_position=["dest_lon", "dest_lat"],
get_source_color=[0, 128, 200],
get_target_color=[200, 0, 80],
get_width="trip_count", width_scale=0.5)min_buildings = st.sidebar.slider("Min buildings per hex", 0, 500, 10)
filtered = h3_data[h3_data["building_count"] >= min_buildings]@st.cache_data(ttl=600) # cache for 10 minutes
def query_buildings(zone_code):
return gpd.read_postgis(f"SELECT * FROM buildings WHERE zone = '{zone_code}'", engine)| Component | Code |
|---|---|
| Text | st.title() / st.header() / st.write() |
| DataFrame | st.dataframe(df) |
| Chart | st.bar_chart(df) / st.line_chart(df) |
| Map | st.pydeck_chart(deck) |
| Slider | st.slider("Label", min, max, default) |
| Select | st.selectbox("Label", options) |
| Sidebar | st.sidebar.slider(...) |
| Caching | @st.cache_data(ttl=N) |
| H3 Layer | pdk.Layer("H3HexagonLayer", ...) |
| Arc Layer | pdk.Layer("ArcLayer", ...) |
| Scatter Layer | pdk.Layer("ScatterplotLayer", ...) |
| Endpoint | URL Pattern | Returns |
|---|---|---|
| Flow | /v7/flow?in=bbox:S,W,N,E | Speed, jamFactor per road segment |
| Incidents | /v7/incidents?in=bbox:S,W,N,E | Accidents, road closures, construction |
You've built an end-to-end urban analytics pipeline — from raw open data to a living dashboard — reproducible for any Indonesian city. Change the city in city_config.json and re-run every notebook. The same code works everywhere.