How satellite radar, time-series analysis, and a neural network combine to map millions of hectares of paddy fields across Java, Indonesia — even through clouds.
The pipeline: from raw radar signal to a 3-million-hectare paddy map
Scroll down or use arrow keys to begin ↓
The problem this system solves — and why optical satellites aren't enough
Java Island, Indonesia, produces a huge share of the country's rice. Knowing exactly where and when rice is growing is critical for food security planning. But there's a problem: Java is in the tropics, and clouds block satellite cameras for most of the growing season.
Use visible light — just like a camera. Clouds completely block the view. Useless during rainy season when rice is actually growing.
Send out radio waves and listen for the echo. Radar penetrates clouds, works day and night. Perfect for tropical monitoring.
The European Space Agency's Sentinel-1 satellite orbits Earth, revisiting Java every 12 days. It sends SAR (radar) pulses toward the ground and measures the backscatter — the energy that bounces back.
A microwave beam hits the ground. Unlike light, microwaves go right through clouds and rain.
Different surfaces return different amounts of energy. Flooded paddy fields (smooth water) reflect almost everything away. Dense vegetation scatters energy back.
The returned signal strength (backscatter) is stored as a number for each pixel. Low number = smooth surface. High number = rough/vegetated surface.
Sentinel-1 records radar in two polarizations: VH (vertical-send, horizontal-receive) and VV (vertical-send, vertical-receive). This system primarily uses VH because rice paddies create a very distinctive VH signature as they grow.
When rice is young and the field is flooded, radar bounces off the water surface like a mirror — almost no signal returns via the VH channel. As rice grows taller, the stems and leaves scramble the signal, and VH backscatter increases dramatically. This "dip then rise" pattern is unique to paddy fields and is how the system identifies them.
The five main components that turn radar signals into a paddy map
Think of this system like an assembly line in a factory. Raw materials (radar data) come in one end, and a finished product (a map showing where rice grows) comes out the other. Each station on the line has a specialized job.
Cleans and calibrates raw satellite radar data. Removes noise, corrects for terrain, converts to usable units.
Transforms raw backscatter values into 29 meaningful numbers per pixel — temporal patterns, differences, ratios, and phenology indicators.
Fixes the training data imbalance by creating synthetic "non-paddy" samples, so the model learns to say "no" as well as "yes."
A 3-layer brain that takes 29 features in and outputs a single probability: "how likely is this pixel to be paddy?"
Combines predictions from multiple time periods and years. Only marks a pixel as paddy if the evidence is consistent and strong.
Tracing a single pixel from raw radar echo to "paddy" or "not paddy"
Imagine one 50-meter square on the ground in Java. Every 12 days, Sentinel-1 flies overhead and records a radar echo from that spot. Let's trace what happens to that data as it moves through the system.
Here's what it might sound like if the system's components could talk to each other during a prediction run:
How 7 raw radar numbers become 29 meaningful features that describe rice growth
A feature is a single number that captures something meaningful about a pixel. The feature engineering step is where domain knowledge meets code: we encode what agricultural scientists know about rice growth into numbers a computer can use.
The raw radar signal at each of the 7 time steps. Like 7 snapshots of how "rough" the surface is.
How much the signal changed between consecutive time steps. Captures the "speed" of growth.
Relative change between time steps. A ratio of 1.5 means the signal grew by 50%, regardless of the absolute level.
Rice growth stage detectors: flooding, early growth, late growth, reproductive, ripening, post-harvest — plus min/max timing.
Here's how the code builds 29 features from 7 time-series values. This is from utils_paddy_vh.py:
# Temporal differences: rate of change
pol_diff = pol_values[:, :-1] - pol_values[:, 1:]
# Temporal ratios: relative change
denominator = np.abs(pol_values[:, 1:])
denominator = np.where(denominator < 1e-10, 1e-10, denominator)
pol_ratio = pol_values[:, :-1] / denominator
Calculate how much the signal changed between each pair of consecutive time steps. Positive = signal increased. Negative = signal decreased.
Now calculate the relative change — divide current by previous. First, make sure we never divide by zero (replace any near-zero values with a tiny number).
A ratio of 1.2 means "20% increase." A ratio of 0.8 means "20% decrease." This captures proportional change regardless of absolute signal strength.
The phenology indicators are the most clever features. They encode agricultural science directly into the model: "given this radar value, which growth stage is the rice most likely in?"
# Paddy-optimized VH thresholds (backscatter x100)
flooding_thresh = -2200
early_veg_high = -1800
late_veg_high = -1500
reproductive_high = -1200
# 1. Flooding: very low backscatter
flooding_detected = (current_value < flooding_thresh)
# 6. Post-harvest: low current but high past max
post_harvest = ((current_value < harvest_thresh) &
(max_values > late_veg_high) &
(range_values > 400))
Define thresholds for each growth stage. These numbers come from agricultural research on how rice affects radar signals. Values are in decibels x100 (so -2200 means -22 dB).
Flooding detection: If the current signal is very low (below -22 dB), the field is likely flooded — water acts like a mirror, reflecting radar away.
Post-harvest detection: The signal is low NOW, but it was high recently (plants were there), and the range is large (big swing from crop to bare). Three conditions must ALL be true — this prevents false positives.
Watch how the radar signal changes across different land cover types. Notice the distinctive "V-shaped dip" for paddy fields during flooding.
Why 92% paddy data created a broken model — and how synthetic samples fixed it
Imagine you're studying for a test where 92 out of 100 questions have the answer "A." You'd learn to just guess "A" every time and still get 92% right — but you'd be useless at actually telling answers apart. That's exactly what happened with the original training data.
9,757 paddy samples vs 838 non-paddy. Model learned to say "paddy" to everything. Detected only 2 hectares.
9,757 real paddy + 9,757 synthetic non-paddy. Model learned the actual difference. Detected 196,335 hectares.
The SMOTE-balanced model detected 93,894 times more paddy area than the imbalanced model. The imbalanced model had "high accuracy" (92%) but was completely useless — it just predicted everything as paddy. This is one of the most common traps in machine learning.
SMOTE works like an art forger who studies real paintings to create convincing new ones. It looks at the existing non-paddy samples, finds ones that are similar to each other, and creates new synthetic samples along the line between them.
Choose one real non-paddy sample from the 838 available.
Look at the 5 most similar non-paddy samples (measured across all 29 features).
Pick a random point on the line between the original and one neighbor. This new point is realistic because it's "between" two real examples.
Keep generating synthetic samples until you have equal numbers of paddy and non-paddy (9,757 each = 19,514 total).
Here's the actual SMOTE application from train_paddy_vh_smote.py:
k_neighbors = min(5, n_non_paddy - 1)
smote = SMOTE(random_state=42, k_neighbors=k_neighbors)
X_resampled, y_resampled = smote.fit_resample(X, y)
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X_resampled)
Use 5 neighbors (or fewer if we don't have enough samples) when generating synthetic data. This controls how "diverse" the fake samples are.
Create the SMOTE generator (random_state=42 means we get the same result every time — reproducibility). Then balance the dataset: X gets the features, y gets the labels (paddy/non-paddy).
Scale all features using RobustScaler, which handles outliers better than standard scaling. This ensures no single feature dominates just because its numbers are bigger.
How a 3-layer network turns 29 features into a paddy/non-paddy decision
Think of the MLP like a funnel: 29 numbers go in at the wide end, get progressively compressed through 128, 64, and 32 neurons, and a single probability comes out at the narrow end.
From utils_paddy_vh.py — each layer serves a specific purpose:
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(29,)),
tf.keras.layers.Dense(128, kernel_regularizer=l2(0.001)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.LeakyReLU(negative_slope=0.1),
tf.keras.layers.Dropout(0.3),
# ... layers 2 and 3 similar ...
tf.keras.layers.Dense(1),
TemperatureLayer(temperature=0.5),
tf.keras.layers.Activation('sigmoid')
])
Create a sequential (layer-by-layer) model that accepts 29 features.
Dense(128): 128 neurons, each connecting to all 29 inputs. L2 regularization penalizes extreme weights — like telling the model "don't rely too heavily on any single feature."
BatchNorm: Normalize values between layers. SAR data has huge dynamic range; this keeps training stable.
LeakyReLU: The "activation function" — decides which signals pass through. "Leaky" means even weak signals get through (at 10% strength), preventing dead neurons.
Dropout(0.3): Randomly silence 30% of neurons during training. Forces the network to be redundant — no single neuron becomes a point of failure.
Final output: Compress to 1 number, divide by 0.5 (temperature scaling makes predictions more confident), then squeeze through sigmoid to get a 0-1 probability.
The input data is a table of 29 numbers per pixel — not an image. CNNs are designed for images where spatial neighbors matter. Here, each pixel is independent — its 29 features already capture all the temporal context needed. An MLP is simpler, faster to train, and works just as well for tabular data.
How multi-year consensus filtering produces a high-confidence 3-million-hectare paddy map
Think of each prediction period as a juror. A single juror might be wrong, but if 5 out of 14 jurors across 2 separate trials all agree — you can be pretty confident. That's the multi-year consensus approach.
min_detections: 5
Pixel must be detected as paddy in at least 5 time periods per year
min_confidence: 0.7
Each detection must have at least 70% model confidence
min_years: 2
Must be detected as paddy in BOTH 2023 and 2024
min_mean_confidence: 0.6
Average confidence across all detections must exceed 60%
From create_multiyear_composite_paddy.py:
# Only count high-confidence detections
high_conf = (paddy_stack == 1) & (conf_stack >= min_confidence)
# How many periods detected paddy?
frequency = np.sum(high_conf, axis=0)
# Paddy if enough detections
year_paddy = (frequency >= min_detections)
# Multi-year consensus
num_years = np.sum(year_stack == 1, axis=0)
consensus = (num_years >= min_years)
For each pixel and each time period: only count it as "paddy detected" if BOTH the prediction says paddy AND the model is at least 70% confident.
Count how many time periods (out of 14 for 2023 and 11 for 2024) had high-confidence paddy detections.
For each year: mark the pixel as "paddy this year" only if it was detected in 5 or more periods. This filters out one-off false positives.
Final check: count how many years agree. Only mark as paddy on the final map if BOTH years say "yes." This catches fields that are consistently rice, not just one-time detections.
This "multiple independent checks must agree" pattern appears everywhere in software: spam filters use it (multiple signals must all say "spam"), fraud detection uses it (multiple rules must trigger), and even self-driving cars use it (multiple sensors must agree on an obstacle). It's called consensus filtering — and it's one of the most powerful ways to reduce false positives.
Connecting all the pieces — and where this system could go next
54 bands across 2023-2024, stored as a multi-year GeoTIFF stack at 50m resolution
Orbit correction, noise removal, terrain correction, speckle filtering, dB conversion
Temporal values + differences + ratios + paddy phenology stages. Vectorized: 1.2M pixels at once
92%/8% imbalance fixed to 50%/50% using synthetic minority samples
128→64→32→1 architecture with temperature-scaled sigmoid output. 80% accuracy, 89.6% AUC
25 periods across 2 years, confidence filtering, terrain + water masking. Final: 3.02M hectares
The phenology features — encoding what scientists know about rice growth — are more powerful than adding more raw data.
A 92/8 split created a useless model. SMOTE turned it into one that finds 3 million hectares. Always check class distribution first.
One prediction can be wrong. Multiple independent predictions that agree are much more reliable. Use consensus filtering for critical decisions.
Processing 1.2 million pixels one-by-one would take hours. NumPy vectorization does it in seconds. Always batch your computations.
You now understand how satellite radar, feature engineering, SMOTE balancing, neural networks, and consensus filtering combine to map 3 million hectares of rice fields across Java, Indonesia.