Operational Landslide Hazard Mapping in West Java via Per-Event Normalisation and Change-Detection Filtering of a Pretrained U-Net

Firman Hadi

Research notes — methods paper draft for Remote Sensing of Environment

2 Phases 85 PVMBG Events ~8,500 km² Mapped Zero Region-Specific Labels

1 / 16

01

Why this problem matters

Landslides in West Java cause repeat fatalities, but operational hazard maps lag the event timeline.

2 / 16

The Operational Challenge

West Java is a landslide hotspot

Steep, rain-saturated tropical terrain — PVMBG investigates dozens of new events annually
Field investigation is slow; remote-sensing maps usually arrive only after the news cycle
Polygon-labelled landslide inventories for Indonesia are scarce; only point-based PVMBG records are public

The methods gap

Problem: Pretrained landslide segmentation models trained on Landslide4Sense (global benchmark) collapse when applied directly to West Java Sentinel-2 imagery — the imagery distribution shifts and the model misfires.

          Approach: Rescue cross-region transfer with per-event z-score normalisation and a ΔNDVI change-detection gate — without any region-specific labels or fine-tuning.
        

3 / 16

02

Approach in one diagram

Two phases: benchmark first, then deploy on West Java with post-hoc filters.

4 / 16

End-to-End Pipeline

Landslide4Sense
14-channel patches

→

U-Net
ResNet-18 pretrain

→

PVMBG events
field-investigated

→

GEE Sentinel-2
Cloud Score+ composites

→

Per-event z-score

→

ΔNDVI gate

→

Hazard map

Key idea: Don't fine-tune on the target region. Instead, treat each event AOI as its own sub-distribution — renormalise per event, then suppress non-vegetation-loss false positives with a pre/post NDVI difference threshold.

A

Phase 1 — Benchmark

Reproduce Stumpf & Kerle (2011) RF/OOA on Landslide4Sense; train U-Net for the same task. Validates the model is sound on its native distribution.

B

Phase 2 — West Java

Deploy the pretrained U-Net on Garut + Sukabumi via GEE pulls; apply per-event normalisation + post-hoc filters; validate with PVMBG point inventory.

5 / 16

03

Phase 1 — Benchmark baseline

Stumpf-faithful Random Forest + a U-Net deep baseline on Landslide4Sense.

6 / 16

Phase 1 — Both baselines land in range

Fig. 1. F-scores on Landslide4Sense. RF baseline reproduces Stumpf & Kerle's per-segment range; U-Net wins on pixel-level by +22 pp.

0.665

RF F (per-segment, test)

0.766

RF F (val, in S&K range)

0.617

U-Net F (pixel)

0.623

U-Net F + TTA

RF/OOA pipeline replicates Stumpf & Kerle's 0.73–0.87 paper range on validation
U-Net (ResNet-18 ImageNet pretrain) beats RF on pixel-level F by 22 pp
Models are validated on the native benchmark before any cross-region deployment

7 / 16

04

Phase 2 — West Java deployment

85 field-investigated PVMBG events across two regencies, no fine-tuning.

8 / 16

Study Area — Garut + Sukabumi

Fig. 2. West Java with Garut and Sukabumi AOIs and 85 PVMBG-investigated landslide events.

~4,300

km² Garut

~4,237

km² Sukabumi

15

Garut events (2022)

25

Sukabumi events (2022)

PVMBG Portal MBG API yields 2,397 nationwide field-investigation records; filtered to landslide events in 2022
No polygon ground truth — only point coordinates of investigated events
Validation metric: hit-rate by radius vs. random-baseline expectation

9 / 16

The Distribution-Shift Fix

Fig. 3. Normalisation mode ablation: per-event z-score recovers events that pooled stats fail on.

Why globally-pooled stats fail

L4S training stats absorb within-patch variance the U-Net learned to use
Re-using those stats on West Java collapses contrast at inference time — a single event lit up out of 85
Per-event z-score restores within-patch contrast: 55/85 events light up after the fix

          Take-away: The model is fine. The normalisation pipeline was the bug. A zero-cost statistical fix unlocks the rest of the dataset.
        

10 / 16

Post-Hoc Filter Stack

Fig. 4. Cumulative effect of slope, ESA WorldCover, and ΔNDVI gates on raw U-Net positives across 85 AOIs.

Three cheap, interpretable filters

Slope ≥ 15° (ALOS DEM): removes 55.8 pp of false positives on flat plains
ESA WorldCover: removes a further 6.4 pp of urban / water / cropland hits
ΔNDVI ≥ 0.1: removes a further 15.1 pp; turns scar-detector into event-specific landslide detector

Net result: 22.7% of raw U-Net positives retained — without losing the validated events.

11 / 16

Garut Hazard Map

Fig. 5. Garut regency-scale hazard map: pre/post Sentinel-2 composites, ΔNDVI, and filtered U-Net overlay.

100%

events ≤ 1 km

93%

events ≤ 500 m

40%

events ≤ 100 m

~55 min

runtime (M-series)

12 / 16

Sukabumi Hazard Map — Independent Generalisation

Fig. 6. Sukabumi regency — same pipeline, different regency, no re-tuning.

100%

events ≤ 1 km

92%

events ≤ 500 m

20%

events ≤ 100 m

~90×

signal vs. random

13 / 16

Validation — Hit Rate vs. Random Baseline

Fig. 7. Cumulative event hit-rate as a function of buffer radius around PVMBG points, vs. random-area baseline.

Reading the chart

At 1 km, both regencies hit 100% — every PVMBG-investigated event has a flagged pixel within 1 km
Random baseline at the same radius is ~1% — the signal is roughly two orders of magnitude above chance
Useful even without polygon labels: this is the metric you can actually compute when only point inventories exist

Regency	≤ 100 m	≤ 500 m	≤ 1 km
Garut (n=15)	40%	93%	100%
Sukabumi (n=25)	20%	92%	100%
Random baseline	0.05%	0.3%	1.1%

14 / 16

Honest Limitations

L1

No polygon ground truth

Cannot report precision / recall / IoU on the West Java side. Hit-rate-by-radius is what's available; precision is unconstrained.

L2

Pure transfer, no fine-tuning

This is by design (the contribution), but it caps achievable performance. Fine-tuning when polygon labels become available is the natural next step.

L3

Cloud cover is the throughput ceiling

Humid tropical season limits clean Sentinel-2 composites. Cloud Score+ helps but does not eliminate the constraint.

L4

"100% within 1 km" is operational, not pixel-perfect

The pipeline is decision-support — it narrows the haystack for field crews. It is not a substitute for a field-investigation polygon dataset.

Not claiming: a region-trained model, sub-100m precision, or that this replaces PVMBG field investigation. The contribution is operational hit-rate without region-specific labels.

15 / 16

What's Next + Resources

Next steps

Province-wide West Java scaling (~37,000 km², ~10 h on M-series — methodology already validated)
Fine-tune on Indonesian polygon labels when/if PVMBG releases polygon inventories
Open-source the GEE + PyTorch pipeline as a reproducible repo

Resources

Stack

Python · PyTorch + segmentation_models_pytorch · Google Earth Engine · Sentinel-2 SR · ALOS DEM · ESA WorldCover

Data

Landslide4Sense (Ghorbanzadeh et al., 2022) · PVMBG Portal MBG API (vsi.esdm.go.id/portalmbg)

Anchor papers

Stumpf & Kerle (2011), RSE · Ghorbanzadeh et al. (2022), Big Earth Data

16 / 16