Operational Landslide Hazard Mapping in West Java via Per-Event Normalisation and Change-Detection Filtering of a Pretrained U-Net

Firman Hadi

Research notes — methods paper draft for Remote Sensing of Environment

2 Phases 85 PVMBG Events ~8,500 km² Mapped Zero Region-Specific Labels
1 / 16
01

Why this problem matters

Landslides in West Java cause repeat fatalities, but operational hazard maps lag the event timeline.

2 / 16

The Operational Challenge

West Java is a landslide hotspot

  • Steep, rain-saturated tropical terrain — PVMBG investigates dozens of new events annually
  • Field investigation is slow; remote-sensing maps usually arrive only after the news cycle
  • Polygon-labelled landslide inventories for Indonesia are scarce; only point-based PVMBG records are public

The methods gap

Problem: Pretrained landslide segmentation models trained on Landslide4Sense (global benchmark) collapse when applied directly to West Java Sentinel-2 imagery — the imagery distribution shifts and the model misfires.
Approach: Rescue cross-region transfer with per-event z-score normalisation and a ΔNDVI change-detection gate — without any region-specific labels or fine-tuning.
3 / 16
02

Approach in one diagram

Two phases: benchmark first, then deploy on West Java with post-hoc filters.

4 / 16

End-to-End Pipeline

Landslide4Sense
14-channel patches
U-Net
ResNet-18 pretrain
PVMBG events
field-investigated
GEE Sentinel-2
Cloud Score+ composites
Per-event z-score
ΔNDVI gate
Hazard map
Key idea: Don't fine-tune on the target region. Instead, treat each event AOI as its own sub-distribution — renormalise per event, then suppress non-vegetation-loss false positives with a pre/post NDVI difference threshold.
A

Phase 1 — Benchmark

Reproduce Stumpf & Kerle (2011) RF/OOA on Landslide4Sense; train U-Net for the same task. Validates the model is sound on its native distribution.

B

Phase 2 — West Java

Deploy the pretrained U-Net on Garut + Sukabumi via GEE pulls; apply per-event normalisation + post-hoc filters; validate with PVMBG point inventory.

5 / 16
03

Phase 1 — Benchmark baseline

Stumpf-faithful Random Forest + a U-Net deep baseline on Landslide4Sense.

6 / 16

Phase 1 — Both baselines land in range

Phase 1 baselines bar chart
Fig. 1. F-scores on Landslide4Sense. RF baseline reproduces Stumpf & Kerle's per-segment range; U-Net wins on pixel-level by +22 pp.
0.665
RF F (per-segment, test)
0.766
RF F (val, in S&K range)
0.617
U-Net F (pixel)
0.623
U-Net F + TTA
  • RF/OOA pipeline replicates Stumpf & Kerle's 0.73–0.87 paper range on validation
  • U-Net (ResNet-18 ImageNet pretrain) beats RF on pixel-level F by 22 pp
  • Models are validated on the native benchmark before any cross-region deployment
7 / 16
04

Phase 2 — West Java deployment

85 field-investigated PVMBG events across two regencies, no fine-tuning.

8 / 16

Study Area — Garut + Sukabumi

West Java study area with PVMBG events
Fig. 2. West Java with Garut and Sukabumi AOIs and 85 PVMBG-investigated landslide events.
~4,300
km² Garut
~4,237
km² Sukabumi
15
Garut events (2022)
25
Sukabumi events (2022)
  • PVMBG Portal MBG API yields 2,397 nationwide field-investigation records; filtered to landslide events in 2022
  • No polygon ground truth — only point coordinates of investigated events
  • Validation metric: hit-rate by radius vs. random-baseline expectation
9 / 16

The Distribution-Shift Fix

Normalisation mode ablation
Fig. 3. Normalisation mode ablation: per-event z-score recovers events that pooled stats fail on.

Why globally-pooled stats fail

  • L4S training stats absorb within-patch variance the U-Net learned to use
  • Re-using those stats on West Java collapses contrast at inference time — a single event lit up out of 85
  • Per-event z-score restores within-patch contrast: 55/85 events light up after the fix
Take-away: The model is fine. The normalisation pipeline was the bug. A zero-cost statistical fix unlocks the rest of the dataset.
10 / 16

Post-Hoc Filter Stack

Filter cascade ablation
Fig. 4. Cumulative effect of slope, ESA WorldCover, and ΔNDVI gates on raw U-Net positives across 85 AOIs.

Three cheap, interpretable filters

  • Slope ≥ 15° (ALOS DEM): removes 55.8 pp of false positives on flat plains
  • ESA WorldCover: removes a further 6.4 pp of urban / water / cropland hits
  • ΔNDVI ≥ 0.1: removes a further 15.1 pp; turns scar-detector into event-specific landslide detector
Net result: 22.7% of raw U-Net positives retained — without losing the validated events.
11 / 16

Garut Hazard Map

Garut regency-scale hazard map
Fig. 5. Garut regency-scale hazard map: pre/post Sentinel-2 composites, ΔNDVI, and filtered U-Net overlay.
100%
events ≤ 1 km
93%
events ≤ 500 m
40%
events ≤ 100 m
~55 min
runtime (M-series)
12 / 16

Sukabumi Hazard Map — Independent Generalisation

Sukabumi regency-scale hazard map
Fig. 6. Sukabumi regency — same pipeline, different regency, no re-tuning.
100%
events ≤ 1 km
92%
events ≤ 500 m
20%
events ≤ 100 m
~90×
signal vs. random
13 / 16

Validation — Hit Rate vs. Random Baseline

Hit-rate-by-radius validation
Fig. 7. Cumulative event hit-rate as a function of buffer radius around PVMBG points, vs. random-area baseline.

Reading the chart

  • At 1 km, both regencies hit 100% — every PVMBG-investigated event has a flagged pixel within 1 km
  • Random baseline at the same radius is ~1% — the signal is roughly two orders of magnitude above chance
  • Useful even without polygon labels: this is the metric you can actually compute when only point inventories exist
Regency≤ 100 m≤ 500 m≤ 1 km
Garut (n=15)40%93%100%
Sukabumi (n=25)20%92%100%
Random baseline0.05%0.3%1.1%
14 / 16

Honest Limitations

L1

No polygon ground truth

Cannot report precision / recall / IoU on the West Java side. Hit-rate-by-radius is what's available; precision is unconstrained.

L2

Pure transfer, no fine-tuning

This is by design (the contribution), but it caps achievable performance. Fine-tuning when polygon labels become available is the natural next step.

L3

Cloud cover is the throughput ceiling

Humid tropical season limits clean Sentinel-2 composites. Cloud Score+ helps but does not eliminate the constraint.

L4

"100% within 1 km" is operational, not pixel-perfect

The pipeline is decision-support — it narrows the haystack for field crews. It is not a substitute for a field-investigation polygon dataset.

Not claiming: a region-trained model, sub-100m precision, or that this replaces PVMBG field investigation. The contribution is operational hit-rate without region-specific labels.
15 / 16

What's Next + Resources

Next steps

  • Province-wide West Java scaling (~37,000 km², ~10 h on M-series — methodology already validated)
  • Fine-tune on Indonesian polygon labels when/if PVMBG releases polygon inventories
  • Open-source the GEE + PyTorch pipeline as a reproducible repo

Resources

Stack

Python · PyTorch + segmentation_models_pytorch · Google Earth Engine · Sentinel-2 SR · ALOS DEM · ESA WorldCover

Data

Landslide4Sense (Ghorbanzadeh et al., 2022) · PVMBG Portal MBG API (vsi.esdm.go.id/portalmbg)

Anchor papers

Stumpf & Kerle (2011), RSE · Ghorbanzadeh et al. (2022), Big Earth Data

© 2026 Firman Hadi · firmanhadi.id / research notes

16 / 16