REEPS Biodiversity Workshop — Part 1 of 4

Spatial Analysis for
Wildlife Conservation

An interactive, hands-on course covering the full pipeline — from raw field data to publication-ready maps — for a ~110 km² tropical landscape in West Java.

📚 8 Modules 🐦 11 Focal Species 🌱 West Java, Indonesia ⏱ ~4 Hours Total
01
Module 1

The Wildlife Detective's Toolkit

What does this pipeline do, and why does it matter? We are building a system to understand where endangered species live across a tropical landscape — and which areas need protection most urgently.

The Big Picture

Imagine you have years of wildlife survey data — GPS points from camera traps, ranger observations, interview records, and digitized maps. Each piece tells a small story. Our pipeline stitches those stories together into a single, spatially explicit picture of biodiversity across the AOI (Area of Interest): roughly 110 km² around the Upper Cisokan Pumped Storage Hydroelectric Power Plant (REEPS) in West Java.

The Pipeline

📋 Field Data
🗃 Database
H3 Grid
📈 Analysis
🗺 Maps & Figures
📋

Field Data

KMZ files, Excel spreadsheets, GPS logs — raw records from 2009–2026 covering multiple survey campaigns.

🗃

Unified Database

Species harmonization, coordinate validation, and integration into a single GeoPackage.

Hexagonal Grid

Uber's H3 system tiles the landscape into equal-area hexagons for fair spatial comparison.

📈

Biodiversity Analysis

Shannon entropy, Simpson index, Chao1 richness, and a composite Conservation Priority Index.

🗺

Maps & Figures

Interactive Folium maps plus publication-quality matplotlib figures ready for peer review.

∙ ∙ ∙

Meet the Focal Species

Eleven species — mostly IUCN-listed — form the core of our analysis. This is the cast of characters you will track through every module.

🐆

Javan Leopard

Panthera pardus melas — Critically Endangered apex predator, fewer than 250 adults remain.

🐵

Javan Slow Loris

Nycticebus javanicus — Critically Endangered nocturnal primate with venomous bite.

🦡

Sunda Pangolin

Manis javanica — Critically Endangered, world's most trafficked mammal.

🐒

Javan Gibbon

Hylobates moloch — Endangered; iconic loud calls echo across intact canopy.

🐒

Grizzled Langur

Presbytis comata — Endangered arboreal leaf-eater endemic to West Java.

🐒

Javan Langur

Trachypithecus auratus — Vulnerable; both melanistic and golden morphs present.

🦅

Bartels's Hawk-eagle

Nisaetus bartelsi — Endangered raptor endemic to Java's forests.

🦦

Asian Small-clawed Otter

Aonyx cinereus — Vulnerable; inhabits forested streams and rivers.

🐎

Lesser Mouse-deer

Tragulus kanchil — Least Concern but indicator of understorey health.

🐈

Leopard Cat

Prionailurus bengalensis — Least Concern small felid; highly adaptable.

🐹

Common Palm Civet

Paradoxurus hermaphroditus — Least Concern; a generalist frugivore.

∙ ∙ ∙

Synthetic Data Generator

Before touching real data, let's generate a synthetic dataset with realistic parameters. Click the button to produce ~400 random occurrence records.

∙ ∙ ∙

🎓 Check Your Understanding

A ranger reports a Sunda Pangolin sighting with GPS coordinates recorded from a handheld device. Before this record enters our spatial analysis pipeline, what is the first processing step?

Calculate the Shannon diversity index for the grid cell
Plot the point on a Folium interactive map
Validate and integrate the record into the unified database with harmonized species names and standardized coordinates
Assign the record to an H3 hexagonal grid cell
02
Module 2

From Field to Database

Raw survey data arrives in many formats. This module covers how we parse KMZ files, harmonize species names, and build a clean, unified database.

Parsing KMZ Files

A KMZ file is simply a zipped archive containing one or more KML files. KML (Keyhole Markup Language) is an XML format that stores geographic features — points, lines, polygons — along with metadata. Our surveys often export sighting locations as KMZ from Google Earth.

💡
Think of it this way: A KMZ is like a .zip that contains a .kml map file. Python's zipfile module opens the outer container; then xml.etree.ElementTree reads the XML map inside.

Code ↔ English

Python
with zipfile.ZipFile(kmz_path) as z:
    for name in z.namelist():
        if name.endswith('.kml'):
            content = z.open(name).read().decode('utf-8')
            root = ET.fromstring(content)
            for pm in root.iter('{http://www.opengis.net/kml/2.2}Placemark'):
                point = pm.find('.//{http://www.opengis.net/kml/2.2}coordinates')
                coords = point.text.strip().split(',')
                lon, lat = float(coords[0]), float(coords[1])
  1. Open the KMZ archive as a zip file.
  2. Loop through all files inside the archive.
  3. Find KML files (the map data).
  4. Read & decode the KML content as UTF-8 text.
  5. Parse the XML tree structure.
  6. Iterate over each Placemark (a sighting location).
  7. Extract the coordinates text and split by comma.
  8. Parse longitude & latitude as floating-point numbers.
∙ ∙ ∙

Species Harmonization

The same animal may appear under different names across datasets: local Indonesian names, outdated scientific names, or variant spellings. Species harmonization maps all variants to a single canonical scientific name.

💡
Why harmonize? The same animal might be recorded as "Kukang", "Nycticebus javanicus", or "Nycticebus coucang" — three names for one species. Without harmonization, our diversity metrics would count them as separate species, inflating richness estimates.
Python
SPECIES_MAP = {
    'Kukang': 'Nycticebus javanicus',
    'Lutung': 'Trachypithecus auratus',
    'Owa Jawa': 'Hylobates moloch',
    'Kucing Hutan': 'Prionailurus bengalensis',
    'Trenggiling': 'Manis javanica',
}
df['Species'] = df['Species'].map(lambda x: SPECIES_MAP.get(x, x))
∙ ∙ ∙

Interactive: Match Local Names to Scientific Names

Drag each local Indonesian name to its correct scientific name.

Local Names

Kukang
Lutung
Owa Jawa
Kucing Hutan
Trenggiling

Scientific Names

Manis javanica
Hylobates moloch
Nycticebus javanicus
Prionailurus bengalensis
Trachypithecus auratus
∙ ∙ ∙

🎓 Check Your Understanding

A dataset contains records labelled "Kucing Hutan" and other records labelled "Prionailurus bengalensis". Are these duplicates?

No — they are completely different species
Yes — "Kucing Hutan" is the local Indonesian name for Leopard Cat (Prionailurus bengalensis), so they refer to the same species and must be harmonized
It depends on the GPS coordinates
Only if they were recorded in the same year
03
Module 3

Mapping Nature's Grid

Why hexagons? How Uber's H3 system tiles a landscape into equal-area cells — and why resolution choice can make or break your analysis.

Why Hexagons?

Before we grid our study area, we need to pick a shape. Squares seem obvious, but hexagons are the gold standard for ecological spatial analysis. Here is why:

Square Grid

8 neighbors at two different distances
(diagonal = 1.41× farther)

Hexagonal Grid

6 neighbors all at the same distance
(no diagonal bias)

Equal-area cells

Every hexagon covers the same ground area, so counts per cell are directly comparable — no correction needed.

🆔

Consistent 6 neighbors

Each cell touches exactly 6 others at a uniform distance. Squares have 4 edge + 4 diagonal neighbors at different distances.

🐝

Less edge bias

Hexagons minimize the perimeter-to-area ratio, reducing edge effects in spatial statistics and habitat analysis.

🍯
Bees figured this out millions of years ago. A honeycomb uses hexagons because they tile a plane with zero gaps while minimizing material (wax). For the same reason, hexagons minimize edge effects in spatial sampling. Nature's engineering is our blueprint.
∙ ∙ ∙

H3 in Action

Uber's H3 library assigns every point on Earth to a hexagonal cell at a chosen resolution. Here is how we apply it to our wildlife records:

Code ↔ English

Python
import h3

# Assign each wildlife sighting to its
# enclosing hexagonal cell
df['h3_index'] = df.apply(
    lambda r: h3.latlng_to_cell(
        r['Latitude'],
        r['Longitude'], 8), axis=1)

# Convert H3 cell to polygon for mapping
boundary = h3.cell_to_boundary(cell)
polygon = Polygon(
    [(lng, lat) for lat, lng in boundary])
  1. Import H3 — Uber's hexagonal spatial index library.
  2. For each row in our wildlife DataFrame, take its Latitude and Longitude.
  3. Find the enclosing hexagon at resolution 8 (~0.74 km² each).
  4. Store the hex ID (a 15-character string like 88426d5e5ffffff).
  5. Convert hex ID → polygon boundary coordinates for drawing on a map.
  6. Swap lat/lng order to match GeoJSON convention (longitude first).
∙ ∙ ∙

Resolution Matters

H3 supports 16 resolution levels (0–15). Too coarse and you mix habitats; too fine and most cells are empty. The sweet spot depends on your data density and the organisms' home ranges.

🔳

Resolution 7 — ~5.2 km²

Too coarse

Mixes forest, agriculture, and riparian zones into single cells. Masks fine-scale habitat differences.

Resolution 8 — ~0.74 km²

Just right ← SELECTED

Matches typical mammal home ranges. Each cell gets enough records for meaningful diversity statistics. ~150 cells cover our 110 km² study area.

🔴

Resolution 9 — ~0.11 km²

Too fine

Creates ~1,000 cells but most contain 0–1 records. Shannon and Simpson indices become meaningless with so few observations per cell.

∙ ∙ ∙

Interactive: Explore Hex Neighbors

Click any hexagon to select it. Its K1 ring (immediate neighbors, like h3.grid_disk(cell, 1)) lights up in green, and the K2 ring (neighbors-of-neighbors) appears in lighter green. This is the basis for spatial autocorrelation analysis.

Click a hexagon to highlight its K1 and K2 rings.

∙ ∙ ∙
💡
150 hexagons cover the entire study area. Only 39 contain wildlife records — the other 111 are unsurveyed, not empty. This distinction matters! Absence of evidence is not evidence of absence. When we calculate species richness, unsurveyed cells get no value, not zero.
∙ ∙ ∙

🎓 Check Your Understanding

You have 500 wildlife records spread across a 110 km² study area. Why would resolution 9 (~0.11 km² per cell, creating ~1,000 cells) be a poor choice?

Resolution 9 hexagons are not supported by the H3 library
Hexagons at resolution 9 overlap and cause double-counting
With ~1,000 cells but only 500 records, most cells would have 0–1 observations — far too few to calculate meaningful diversity indices per cell
Resolution 9 cells are too large to distinguish between habitat types
04
Module 4

Measuring Biodiversity

From raw species counts to robust diversity metrics — Shannon entropy, Simpson's index, Pielou evenness, and Chao1 richness estimation.

The Diversity Question

Consider two hexagonal cells in our study area. Which one is more biodiverse?

🅏

Cell A

10 Javan Leopards + 1 Javan Gibbon
11 individuals, 2 species — heavily dominated by one species

🅐

Cell B

5 Javan Leopards + 6 Javan Gibbons
11 individuals, 2 species — evenly distributed

💡
Same species richness (S = 2), but Cell B is more diverse. Why? Because diversity is not just how many species you have — it is also how evenly individuals are spread among them. A cell dominated by one species is less functionally diverse. The metrics below capture this distinction mathematically.
∙ ∙ ∙

Shannon Entropy (H′)

Borrowed from information theory, Shannon entropy measures the "surprise" or uncertainty in a community. Higher H′ means more species and/or more even distribution — a key measure of alpha diversity.

Code ↔ English

Python
from math import log
from collections import Counter

counts = Counter(cell_species)
# e.g., {'Leopard': 10, 'Gibbon': 5,
#        'Pangolin': 3}
N = sum(counts.values())  # total = 18

H = -sum(
    (n/N) * log(n/N)
    for n in counts.values()
    if n > 0
)
  1. Count how many of each species appear in this cell.
  2. Sum all individuals to get the total N.
  3. For each species, calculate its proportion p = n/N.
  4. Multiply p × ln(p) — this is always negative.
  5. Sum across species and flip the sign.
  6. Higher H′ = more evenly diverse. Ranges from 0 (one species) to ln(S) (perfect evenness).
∙ ∙ ∙

Simpson's Index (D) & Pielou Evenness (J)

Two complementary metrics that capture different facets of diversity. Simpson asks "what is the probability two random individuals are different species?" while Pielou normalizes Shannon by its theoretical maximum.

📈

Simpson's D

# Probability that two random
# individuals are different species
D = 1 - sum(
    (n/N) ** 2
    for n in counts.values()
)

Ranges from 0 (one species dominates) to nearly 1 (many equally abundant species). Less sensitive to rare species than Shannon.

Pielou's J

# Shannon normalized by max
# possible Shannon (ln S)
J = H / log(S) if S > 1 else 0

Ranges from 0 (complete dominance) to 1 (perfect evenness). Answers: "How close is this community to having equal abundances?"

∙ ∙ ∙

Interactive: Diversity Calculator

Adjust the sliders to change species counts and watch H′, D, and J update in real time. Try making one species dominant, then redistribute evenly.

Total N
20
Shannon H′
1.386
Simpson D
0.750
Pielou J
1.000

Perfect evenness — all species equally abundant.

∙ ∙ ∙

Chao1 — How Many Species Are Hiding?

Observed species richness is always an undercount. Chao1 uses the frequency of rare species — singletons (species seen once) and doubletons (seen twice) — to estimate the true species pool.

Code ↔ English

Python
S_obs = len(counts)  # observed richness

# Count singletons and doubletons
f1 = sum(1 for v in counts.values()
         if v == 1)  # singletons
f2 = sum(1 for v in counts.values()
         if v == 2)  # doubletons

if f2 > 0:
    chao1 = S_obs + (f1 ** 2) / (2 * f2)
elif f1 > 0:
    chao1 = S_obs + f1 * (f1 - 1) / 2

completeness = S_obs / chao1 * 100
  1. Count observed species (S_obs) in the cell.
  2. Singletons (f1): species seen only once — these hint at more unseen species.
  3. Doubletons (f2): species seen exactly twice — anchor the estimate.
  4. Chao1 formula: S_obs + f1² / (2 × f2). More singletons relative to doubletons = more hidden species.
  5. Edge case: if no doubletons, use an alternative formula.
  6. Completeness: what percentage of the estimated total have we actually found?
🔍
Chao1 said 11.0 species — we found 11. That means we have sampled 100% of the detectable species pool in our study area. But this only covers species detectable by our survey methods (camera traps, visual observations, signs). Cryptic species (bats, small reptiles, insects) would need fundamentally different survey methods — ultrasound recorders, pitfall traps, eDNA sampling — to be counted.
∙ ∙ ∙

🎓 Check Your Understanding

Cell A has 50 records of 8 species with 0 singletons. Cell B has 12 records of 6 species with 4 singletons. Which cell likely has more undiscovered species?

Cell A — it has more total species observed
Cell B — 4 singletons out of 6 species means many species are barely detected, suggesting more are hiding
Neither — both cells have been fully sampled
Cell A — more records always means better sampling
05
Module 5

Conservation Priority Index

Not all biodiversity is equal. A composite score — weighted by PCA — lets us rank every hexagonal cell from "low concern" to "critical priority."

Why Not Just Count Species?

Imagine two cells. Cell A has 3 common species — Leopard Cat, Palm Civet, Mouse-deer — all Least Concern. Cell B has 2 critically endangered species — Javan Leopard and Sunda Pangolin. Which cell matters more for conservation?

🍃

Cell A — 3 Species

Leopard Cat, Palm Civet, Mouse-deer. All Least Concern. High richness, but low conservation urgency.

🚨

Cell B — 2 Species

Javan Leopard (CR), Sunda Pangolin (CR). Lower richness, but irreplaceable conservation value.

💡
We need a composite score that weighs multiple dimensions — not just "how many?" but also "how threatened?", "how connected?", "how diverse?", and "how many species co-occur?" That composite is the Conservation Priority Index (CPI).
∙ ∙ ∙

Five CPI Components

PCA (Principal Component Analysis) revealed which dimensions carry the most variance. The resulting weights tell us how much each component matters in the final score.

📊

Richness 28%

How many species are present in this cell? The most fundamental measure of biodiversity.

🔗

Connectivity 22%

How connected is this cell to its occupied neighbors? Isolated cells are more vulnerable.

🚨

Threatened Species 20%

How many CR/EN/VU species inhabit this cell? Presence of threatened species elevates priority.

Shannon Diversity 18%

How evenly distributed are species? A cell with 5 balanced species beats one dominated by 1.

🤝

Co-occurrence 12%

How many species pairs coexist in this cell? High co-occurrence signals intact community structure.

∙ ∙ ∙

Min-Max Normalization

Before combining scores, each component must be on the same 0–1 scale. Otherwise, a metric measured in hundreds would drown out one measured in single digits.

Code ↔ English

Python
for col in ['Richness', 'Diversity',
            'Connectivity', 'Co_occurrence',
            'Threatened']:
    vmin, vmax = df[col].min(), df[col].max()
    if vmax > vmin:
        df[f'n_{col}'] = (df[col] - vmin) / (vmax - vmin)
    else:
        df[f'n_{col}'] = 0
  1. Loop through each of the 5 CPI components.
  2. Find the min and max values across all cells.
  3. Squeeze each score to 0–1: the worst cell gets 0, the best gets 1, everyone else lands proportionally in between.
  4. Edge case: if all cells have the same value (max = min), assign 0 to avoid division by zero.
∙ ∙ ∙

Weighted Sum

Once each component is normalized, we multiply by its PCA-derived weight and add them up. The result: one number per cell.

Code ↔ English

Python
weights = {
    'Richness': 0.28,
    'Diversity': 0.18,
    'Connectivity': 0.22,
    'Co_occurrence': 0.12,
    'Threatened': 0.20
}

df['Priority_Index'] = sum(
    df[f'n_{k}'] * v
    for k, v in weights.items()
)
  1. Define the weight dictionary — percentages from PCA that sum to 1.00.
  2. For each component, multiply its normalized value by its weight.
  3. Sum the products — the result is a single Priority Index number between 0 and 1.
  4. Higher = more important for conservation action.
∙ ∙ ∙

Priority Tiers

The continuous CPI score is binned into four actionable tiers, each with a clear management implication.

CRITICAL
≥ 0.70
Immediate protection needed
HIGH
0.50 – 0.69
Priority monitoring zone
MEDIUM
0.30 – 0.49
Regular survey schedule
LOW
< 0.30
Maintain current land use
∙ ∙ ∙

Interactive: CPI Calculator

Adjust each component on a 0–10 scale. The calculator applies min-max normalization (treating 0 as min, 10 as max) and the PCA weights to produce a CPI score and tier classification.

CPI Score
0.500
Priority Tier
HIGH

∙ ∙ ∙
🔍
PCA revealed correlations and independence. Richness, connectivity, threatened species, diversity, and co-occurrence are all correlated (r = 0.54–0.80), confirming they measure overlapping but distinct facets of conservation value. However, habitat permeability is independent — confirming it captures a different dimension entirely, which is why we analyze it separately in Module 6.
∙ ∙ ∙

🎓 Check Your Understanding

Cell X has richness=8, connectivity=2, threatened=5, diversity=high, co-occurrence=moderate.
Cell Y has richness=4, connectivity=6, threatened=3, diversity=low, co-occurrence=high.
Without calculating, which cell likely gets a higher CPI?

Cell X — it dominates on the two heaviest-weighted components (richness 28% and threatened 20%)
Cell Y — its high connectivity and co-occurrence outweigh Cell X's advantages
They would score roughly the same
Cannot determine without knowing exact normalized values
06
Module 6

Connectivity & Corridors

Even high-priority cells are vulnerable if they are isolated. This module uses NetworkX graph theory and resistance surfaces to map how wildlife can — or cannot — move across the landscape.

The Airport Hub Metaphor

Think of occupied hex cells as airports. Each airport has flights (edges) to neighboring airports. Empty cells between clusters are like missing routes. Some empty cells — if "built" — could connect two isolated airport networks. That empty cell is a stepping stone.

Occupied Cells = Airports

Cells with species records are nodes in the network. They are confirmed habitat.

🚧

Empty Cells = Potential Routes

Unoccupied cells that could serve as habitat bridges if restored or protected.

🎡

Stepping Stones = Key Bridges

Empty cells that, if activated, would connect otherwise isolated habitat patches.

∙ ∙ ∙

Building the Network

We use NetworkX, Python's graph theory library, to turn the hex grid into a network where each cell is a node and touching cells share an edge.

Code ↔ English

Python
import networkx as nx

G = nx.Graph()
for cell in all_h3:
    G.add_node(cell)
for cell in all_h3:
    for nbr in h3.grid_disk(cell, 1):
        if nbr != cell and nbr in G:
            G.add_edge(cell, nbr)
  1. Import NetworkX — Python's go-to library for graph analysis.
  2. Create an empty graph G.
  3. Add every hex cell as a node in the graph.
  4. For each cell, find its K1 neighbors using H3's grid_disk.
  5. If a neighbor exists in the graph, draw an edge between them.
  6. Result: a network map of which cells can "talk to" their neighbors.
∙ ∙ ∙

Connectivity Metrics

Each cell's position in the network reveals its ecological role. Interior cells are safe havens; keystone cells are structurally critical.

🛡

Interior Cells

All 6 neighbors occupied. These are the safest cells — buffered on all sides by other habitat. Core of the contiguous patch.

Edge Cells

1–5 neighbors occupied. Exposed to non-habitat on at least one side. Vulnerable to encroachment and edge effects.

🚫

Isolated Cells

0 occupied neighbors. Completely disconnected from the main habitat network. Species here face local extinction risk.

🔑

Keystone Cells

Removal disconnects the patch. These cells are structural bottlenecks — losing one splits the habitat into isolated fragments.

∙ ∙ ∙

Stepping Stone Analysis

Empty cells are scored by how effectively they could bridge gaps between occupied habitat. Betweenness centrality measures how many shortest paths pass through a node — high betweenness means the cell is a critical bridge.

Code ↔ English

Python
betweenness = nx.betweenness_centrality(G)

score = (
    len(k1_occ) * 3
    + k2_richness * 0.5
    + betweenness.get(cell, 0) * 10
)
tier = ('High' if score > 5
        else ('Medium' if score > 2
        else 'Low'))
  1. Calculate betweenness centrality for every node in the graph.
  2. For each empty cell, score it based on three factors:
  3. K1 occupied neighbors × 3 — immediate habitat contact counts most.
  4. K2 species richness × 0.5 — nearby biodiversity adds value.
  5. Betweenness × 10 — cells that bridge distant patches score highest.
  6. Classify into tiers: High (>5), Medium (2–5), or Low (<2).
∙ ∙ ∙

Resistance Surface

Not all land is equally permeable. A resistance surface assigns a movement cost to each land cover type. Low resistance = easy passage; high resistance = major barrier.

Land CoverResistanceAnalogyImplication
Natural Forest1HighwayUnrestricted movement for all species
Agroforest5Good roadSome species can traverse; canopy-dependent ones struggle
Cropland40Dirt pathOnly generalists (Leopard Cat, Civet) cross; primates avoid
Built-up100Brick wallVirtually impassable for all focal species
🔍
Resistance values are species-specific in practice. A Javan Leopard (large home range, ground-dwelling) tolerates agroforest better than a Javan Gibbon (strictly arboreal, needs continuous canopy). Here we use a consensus resistance for the full focal species assemblage.
∙ ∙ ∙

Interactive: Stepping Stone Explorer

This simplified 7×5 hex grid shows occupied cells (green) and empty cells (gray). Click or tap any empty cell to calculate its stepping stone score based on occupied neighbors and estimated betweenness. The cell will be colored by its corridor tier.

Click an empty (gray) cell to calculate its stepping stone score.
Occupied Empty High corridor Medium corridor Low corridor
∙ ∙ ∙
💡
All 39 occupied cells form ONE contiguous patch — no isolated populations! But the patch is thin: 33 edge cells vs only 6 interior cells. This makes the network fragile; losing even a few edge cells could fragment the patch into disconnected pieces.
∙ ∙ ∙

🎓 Check Your Understanding

An empty cell sits between two CRITICAL-priority occupied cells. It has 3 occupied K1 neighbors and high betweenness centrality. What corridor tier should it get?

Low — it is currently empty so it has no conservation value
Medium — 3 neighbors is moderate connectivity
High — 3 occupied neighbors (×3 = 9) plus high betweenness easily exceeds the threshold of 5
Cannot determine without knowing the exact betweenness value
07
Module 7

Interactive Maps with Folium

All our diversity scores, priority tiers, and corridor networks live in tables. Now we turn them into zoomable, clickable web maps using Folium — think “Google Maps but for your biodiversity data.”

Why Interactive Maps?

Static images are fine for a report, but they cannot zoom into a single hex cell, toggle layers on and off, or show a tooltip when you hover. Folium wraps the powerful Leaflet.js mapping library in pure Python — one function call produces a self-contained .html file you can open in any browser.

📷

Static Image

Fixed zoom, no interactivity. Good for print, but you cannot explore details.

🌎

Interactive Web Map

Zoom, pan, click, toggle layers, hover for tooltips. Share as a single HTML file — no server needed.

∙ ∙ ∙

Building a Base Map

Every Folium map starts with a center point and a basemap tile layer.

Code ↔ English

Python
import folium
import branca.colormap as cm

center = [h3_all.geometry.centroid.y.mean(),
          h3_all.geometry.centroid.x.mean()]
m = folium.Map(location=center,
               zoom_start=13,
               tiles='CartoDB positron')
  1. Import Folium (map generation) and branca (color scales).
  2. Calculate the center of all hex cell centroids — the average latitude and longitude.
  3. Create a web map centered on that point, zoomed to neighborhood level (13).
  4. Use a clean light basemap (CartoDB positron) so colored hexagons stand out clearly.
∙ ∙ ∙

Adding Hex Layers with Styling

Each hex cell is drawn as a GeoJSON polygon, colored by its metric value. Hovering reveals a tooltip with the cell’s details.

Code ↔ English

Python
cmap_r = cm.LinearColormap(
    ['#ffffcc', '#fd8d3c', '#bd0026'],
    vmin=0, vmax=max_r,
    caption='Species Richness')

folium.GeoJson(
    h3_div.__geo_interface__,
    name='Species Richness',
    style_function=lambda f: {
        'fillColor': cmap_r(
            f['properties'].get('Richness__S', 0)),
        'color': '#333333',
        'weight': 1,
        'fillOpacity': 0.6},
    tooltip=folium.GeoJsonTooltip(
        fields=['h3_index', 'Richness__S',
                'Shannon__H'],
        aliases=['Cell', 'Richness',
                 "Shannon H'"])
).add_to(m)
  1. Create a yellow → orange → red color scale calibrated from 0 to the maximum richness value.
  2. Load the hex layer as GeoJSON from the diversity GeoDataFrame.
  3. Color each polygon by its species richness score using the color ramp.
  4. Style borders: thin dark-gray outlines with 60% fill opacity so the basemap shows through.
  5. Add a hover tooltip showing the cell ID, richness count, and Shannon H′ score.
∙ ∙ ∙

Layer Control

Folium’s built-in LayerControl adds a toggle panel to the map. Each GeoJSON overlay gets a checkbox — users can switch between diversity, priority, and corridor layers without reloading.

💡
Layer tip: Add each thematic layer with a descriptive name= parameter. Then call folium.LayerControl().add_to(m) once at the end. The map will display a floating checkbox panel in the top-right corner.
∙ ∙ ∙

The 7 Maps Produced

The pipeline generates seven interactive HTML maps, each focused on a different analytical theme.

🌈

Diversity Map

Richness + Shannon H′ per hex cell. Yellow → red color ramp.

Priority Map

CPI tiers (Critical / High / Medium / Low) with survey gap overlays highlighting under-sampled cells.

🔗

Connectivity Map

Ecological connectivity scores per cell — interior, edge, and isolated classifications.

🌳

Corridor Map

Stepping stone cells color-coded by corridor priority tier.

Temporal Map

Species turnover trends — cells colored by how species composition changes across survey years.

H3 Summary Map

Record counts and temporal coverage per hex — a quick “data density” overview of the entire grid.

🤝

Co-occurrence Map

Species pair hotspots — cells where multiple focal species share habitat simultaneously.

∙ ∙ ∙
📦
Each map is a single .html file — no server needed. Share it via email, put it on a USB drive, open it on any computer or phone with a browser. That is the power of self-contained web maps: the data, the styling, and the Leaflet.js engine are all embedded in one file.
∙ ∙ ∙

🎓 Check Your Understanding

Your map shows a hex cell colored dark red (high richness) but the tooltip says “Years surveyed: 1”. Should you trust this cell’s richness score?

Yes — dark red means high richness, and the color scale does not lie
Yes — even one year of intensive surveying can capture full richness
No — one year of data could be a single lucky survey; without repeat visits across years, richness is likely an artefact of uneven sampling effort
No — dark red always indicates an error in the color mapping function
08
Module 8

Publication Figures with Matplotlib

Interactive maps are great for exploration, but journals need 300 DPI PDFs and reviewers expect polished static figures. Matplotlib generates vector graphics you can drop straight into a manuscript.

Why Static Figures?

Interactive maps cannot be printed at 300 DPI. Peer reviewers expect precise axis labels, scale bars, and north arrows. Matplotlib produces publication-quality vector graphics (PDF, SVG) and high-resolution raster images (PNG) — exactly what journals require.

🌎

Interactive (Folium)

Best for exploration, presentations, stakeholder meetings. Zoomable and clickable.

📄

Static (Matplotlib)

Best for journal papers, printed reports, archival. Fixed layout at 300+ DPI.

∙ ∙ ∙

Loading a Satellite Basemap

Our figures use a Sentinel-2 satellite image as a basemap. Rasterio reads the GeoTIFF and reprojects it from UTM to WGS 84 (EPSG:4326).

Code ↔ English

Python
import rasterio
from rasterio.warp import (
    reproject, Resampling,
    calculate_default_transform)

with rasterio.open(S2_TIF) as src:
    transform, width, height = \
        calculate_default_transform(
            src.crs, "EPSG:4326",
            src.width, src.height,
            *src.bounds)
    rgb = np.zeros((3, height, width),
                   dtype=np.uint8)
    for band in range(1, 4):
        reproject(
            source=rasterio.band(src, band),
            destination=rgb[band - 1],
            dst_transform=transform,
            dst_crs="EPSG:4326",
            resampling=Resampling.bilinear)
  1. Import rasterio — Python’s library for reading raster geospatial data (satellite images, DEMs).
  2. Open the Sentinel-2 GeoTIFF file.
  3. Calculate the transform needed to reproject from the image’s native CRS (UTM) to latitude/longitude (EPSG:4326).
  4. Create an empty RGB array to hold the reprojected red, green, and blue bands.
  5. Reproject each band using bilinear resampling — smoothly resizes pixels to the target grid.
  6. Result: a clean RGB satellite image aligned to the same coordinate system as our hex data.
∙ ∙ ∙

Building the Figure

With the satellite image loaded, we layer hex polygons on top and save at print resolution.

Code ↔ English

Python
fig = plt.figure(figsize=(7.0, 5.4),
                 facecolor="white")
ax = fig.add_axes([0.02, 0.02,
                   0.78, 0.96])

ax.imshow(s2_rgba, extent=s2_extent,
         origin="upper",
         interpolation="bilinear")

gdf_unocc.plot(ax=ax, color="#E8E8E8",
    alpha=0.22, edgecolor="#555",
    linewidth=0.35)

gdf_occ.plot(ax=ax, color=colors,
    alpha=0.65, edgecolor="#555",
    linewidth=0.35)

fig.savefig("richness.png",
    dpi=300, bbox_inches="tight")
fig.savefig("richness.pdf",
    bbox_inches="tight")
  1. Create a figure sized for a single journal column (7 × 5.4 inches), white background.
  2. Add an axes panel that fills most of the figure, leaving room for a colorbar.
  3. Layer the satellite image underneath everything, smoothly interpolated.
  4. Draw faint “ghost” hexagons for empty cells — light gray at 22% opacity so the satellite shows through.
  5. Draw colored hexagons for occupied cells at 65% opacity, colored by the metric value.
  6. Save at 300 DPI as PNG (for review) and PDF (vector, for final submission).
∙ ∙ ∙

Figure Gallery

The pipeline produces seven figure sets — each with multiple panels covering different metrics and composite layouts.

🌈

Diversity (5 panels)

Richness, Shannon, Simpson, trend over time, and a composite overlay.

Priority (5 panels)

CPI scores, tier classification, survey gaps, threat overlay, and composite.

🔗

Connectivity

Interior / edge / isolated cell classification on the satellite basemap.

🌳

Corridor

Stepping stone priorities overlaid on the resistance surface.

🤝

Co-occurrence

Species pair heatmaps showing shared habitat cells for focal species combinations.

Turnover

Temporal species composition change — which cells gained or lost species over the study period.

H3 Maps

Record density and temporal coverage per cell — the “sampling effort” layer.

∙ ∙ ∙

Publication Styling Tips

Small details separate a good figure from a publishable one. Follow these steps for every map figure.

North Arrow + Scale Bar

Every map figure must include a north arrow and a metric scale bar. Reviewers will reject figures without spatial reference.

Consistent Color Ramps

Use the same yellow → red ramp for richness across all panels. Same blue → purple for connectivity. Consistency lets readers compare figures at a glance.

Label High-Value Cells

Use ax.text() to annotate cells with extreme values (highest richness, critical CPI). Text overlays guide the reader’s eye.

Composite 4-Panel Layout

Use plt.subplots(2, 2) for journal submissions. Four related metrics in one figure saves space and strengthens the narrative.

∙ ∙ ∙
🎨
Every figure uses the same Sentinel-2 basemap (30 September 2024) and the same hex border style — consistent visual framing makes your paper look professional and lets readers focus on the data, not the styling differences.
∙ ∙ ∙

🎓 Course Capstone Question

You need to present biodiversity findings to three audiences: (a) a government official on their phone, (b) a journal peer reviewer, and (c) a field team planning next month’s survey. Which output format do you use for each?

PDF for all three — it is the universal format
(a) Interactive HTML map — zoomable on a phone; (b) 300 DPI PDF/PNG figures — print-ready for the journal; (c) Interactive HTML with priority + corridor layers — clickable field planning tool
Interactive HTML for all three — it is the most feature-rich
(a) PNG screenshot; (b) Interactive HTML; (c) Raw GeoPackage file