Interactive Course

Mapping Rice Growth from Space

How satellite radar, time series analysis, and neural networks predict the growth stage of every paddy field across Java Island.

Start Learning
3.5M
Hectares mapped
6
Growth stages
31
Periods per year
29
Features per pixel
01

What Satellites See in a Rice Field

Radar signals bounce off rice paddies differently depending on the growth stage. That physical reality is what makes this entire system possible.

The Challenge: Monitoring 3.5 Million Hectares

Indonesia is the world's third-largest rice producer. Java Island alone has millions of hectares of paddy fields, and the government needs to know: how much rice is planted, how much is growing, and when will it be harvested?

Sending people to every field is impossible. Optical satellites (like regular cameras) are blocked by clouds — and Java is cloudy most of the year. The solution? SAR radar from the Sentinel-1 satellite.

🌌
Works through clouds
Radar penetrates cloud cover, unlike optical cameras. Critical for tropical Indonesia.
🌑
Works at night
The satellite provides its own illumination (microwave pulses), so it doesn't need sunlight.
📡
Senses structure
Radar echoes change based on surface texture — flooded fields vs. tall plants vs. bare soil all look different.
📅
Revisits every 12 days
Sentinel-1 captures the same area regularly, creating a time series we can analyze.

How Radar "Sees" Rice Growth

The satellite transmits a vertical (V) microwave pulse and records the horizontal (H) echo — called VH backscatter. The strength of this echo (measured in decibels) changes as rice grows:

VH Backscatter Through a Rice Growing Cycle
-10 dB -13 dB -16 dB -19 dB -22 dB 1. Flooding 2. Early Veg 3. Late Veg 4. Early Gen 5. Late Gen 6. Harvest water = weak echo dense plants = strong echo cut stalks = weak echo
💡
The Core Insight
Rice growth follows a predictable radar signature: the signal starts weak (water surface reflects radar away), gets stronger as plants grow taller and denser (vegetation scatters radar back), then drops again after harvest. This U-shaped-then-inverted curve is the physical basis for the entire classification system.

The 6 Growth Stages

The system classifies every pixel into one of these stages. Think of them as chapters in a rice plant's biography:

💧
Stage 1: Flooding
Fields are deliberately flooded before planting. VH < -20 dB (very weak echo — smooth water surface).
🌱
Stage 2: Early Vegetative
Young seedlings emerge from water. VH -20 to -17 dB. Signal rising rapidly.
🌿
Stage 3: Late Vegetative
Plants grow tall and dense. VH -17 to -14 dB. Approaching peak biomass.
🌾
Stage 4: Early Generative
Rice heads form and fill with grain. VH -14 to -11 dB. Peak radar signal.
🌽
Stage 5: Late Generative
Grain matures, plants start drying. VH -15 to -13 dB. Signal starting to decline.
🍆
Stage 6: Post-Harvest
Rice is cut, field returns to bare soil or stubble. VH < -17 dB. Sharp signal drop.

Check Your Understanding

A farmer reports their field was just flooded for the new planting season. What would you expect the radar signal to look like?

Why does this system use SAR radar instead of optical satellite imagery (regular photos)?

You're adding a new crop type to this system (corn instead of rice). The corn lifecycle doesn't include a flooding stage. How would this change the radar signature?

02

The 12-Day Time Machine

How the system slices a full year into 31 periods and looks backward through time to understand each moment.

Slicing a Year into 31 Snapshots

Imagine a time-lapse camera that photographs every rice field on Java once every 12 days. That's essentially what Sentinel-1 does. The system divides each year into 31 periods of 12 days each:

12-Day Period Calendar (2024 Example)
P1
Jan 1-12
P2
Jan 13-24
P3
Jan 25-Feb 5
P4
Feb 6-17
P5
Feb 18-29
P6
Mar 1-12
P7
Mar 13-24
P8
Mar 25-Apr 5
...
 
P15
Jun 5-16
...
 
P23
Sep 7-18
...
 
P30
Dec 15-26
P31
Dec 27-31
 
📌
Why Period 7 is the First Valid Prediction
The model needs to look at 7 consecutive periods (current + 6 previous). Period 7 is the earliest that has 7 periods before it (P1 through P7). Trying to predict Period 3, for example, would only have 3 periods of history — not enough to see the growth pattern.

Looking Backward Through Time

When the system predicts the growth stage for Period 15, it doesn't just look at one snapshot. Like a doctor reviewing your medical history before making a diagnosis, it examines the current period plus the 6 preceding periods — 7 snapshots total.

P9
6 back
P10
5 back
P11
4 back
P12
3 back
P13
2 back
P14
1 back
P15
Current
Click "Next Step" to see how the backward window works
Step 0 / 4

How the Code Gets the Right Bands

The GeoTIFF file has 31 bands (one per period). The code uses 0-based indexing internally, meaning Band 1 = Index 0, Band 15 = Index 14.

CODE
for i in range(n_periods):
    band_idx = current_period - 1 - i

    if band_idx < 0:
        return None

    value = float(s1_data[band_idx, row_idx, col_idx])
    if np.isnan(value) or np.isinf(value):
        return None

    vh_values.append(value)
PLAIN ENGLISH

For each of the 7 time steps (current + 6 previous)...

Calculate which band to read: Period 15 on step 0 = Band 14, step 1 = Band 13, and so on backward.

If we'd go before the start of the data, this pixel can't be predicted — bail out.

Read the VH backscatter value at this specific pixel location and time step.

If the value is missing or corrupted (NaN/Inf), this pixel is unusable — bail out.

Add the valid value to our growing list of 7 time-series measurements.

Source: utils.py lines 199-212

Check Your Understanding

You want to predict Period 23. Which bands from the GeoTIFF stack does the model read?

The system currently uses 7 periods (84 days). If you wanted to track two full rice cycles instead of one, you'd need about 14 periods. What's the trade-off?

03

Turning Raw Signals into Features

The model doesn't just look at 7 raw numbers. It engineers 29 features per pixel — like a detective building a case from clues.

The 29 Features: 4 Categories

Think of feature engineering as giving the model different lenses to examine the same data. Each category reveals something different about the rice growth pattern:

📈
VH Time Series 7 features
The raw backscatter values at each of the 7 time steps. Like reading a thermometer at 7 different times — gives you the absolute "temperature" of the field.
Temporal Differences 6 features
The change between consecutive periods. Is the signal going up (growing), down (harvesting), or flat (mature)? Like measuring how much taller a child grew each month.
÷
Temporal Ratios 6 features
The proportional change between periods. A jump from -20 to -17 dB means something different than -14 to -11 dB, even though both are +3 dB.
🌱
Phenology + Extrema 10 features
Domain-specific detectors: has flooding occurred? Where's the min/max? These encode expert knowledge about rice growth patterns directly.

Building the Differences and Ratios

After extracting the 7 raw VH values, the code computes how the signal changes between each pair of consecutive periods:

CODE
# Temporal differences
for i in range(len(vh_values)-1):
    diff = vh_values[i] - vh_values[i+1]
    features.append(diff)

# Temporal ratios with safe division
for i in range(len(vh_values)-1):
    denominator = abs(vh_values[i+1])
    if denominator < 1e-10:
        ratio = 0.0
    else:
        ratio = vh_values[i] / denominator
    features.append(ratio)
PLAIN ENGLISH

Differences — for each consecutive pair of time steps...

Subtract the older value from the newer one. Positive = signal getting stronger (growing). Negative = signal getting weaker (harvesting).

Add this "speed of change" number to our feature list.

Ratios — for the same pairs...

Get the absolute value of the older measurement. If it's basically zero (which would crash the division), use a safe default of 0.

Otherwise, divide current by previous to get the proportional change. A ratio > 1 means growth, < 1 means decline.

Source: utils.py lines 220-236

Phenology Detection: Expert Knowledge in Code

The most interesting features come from phenology detection — where domain expertise about rice growth is encoded as rules. The code looks at the VH values and asks: "Does this pattern look like flooding? Like peak biomass? Like a harvest?"

CODE
# Flooding: very low VH signal
if vh_smooth[vh_min_idx] < -2000:
    phenology['flooding_detected'] = True

# Early Vegetative: rising after flood
if phenology['flooding_detected'] \
   and np.mean(recent_slopes) > 200:
    if -2000 <= current_vh <= -1700:
        phenology['early_vegetative'] = True

# Post-harvest: sharp decline
if slopes[-1] < -200:
    phenology['post_harvest'] = True
PLAIN ENGLISH

Flooding check: If the lowest signal in the window is below -20 dB (scaled ×100 = -2000), mark "flooding detected." Smooth water = extremely weak radar echo.

Early vegetative check: If we already detected flooding AND the signal is rising steeply (slope > 200) AND the current value is in the -20 to -17 dB range — young rice is emerging from water.

Post-harvest check: If the most recent slope is sharply negative (dropping fast), the rice was likely just cut down.

Source: utils.py lines 316-353

💡
Domain Knowledge is a Feature
These threshold-based detectors encode decades of agronomic research into simple yes/no signals. The neural network doesn't need to "rediscover" that flooding looks a certain way — we tell it directly. This is a powerful pattern in ML: combining data-driven learning with expert knowledge gives better results than either approach alone.

Research Finding: Which Features Matter Most?

An experiment tested whether absolute VH values or differential features (differences/ratios) contribute more to accuracy. The results were surprising:

ApproachFeaturesAccuracyVerdict
Baseline (all 29)2959.8%Baseline
Z-score normalized2953.5%Worse
Differential only2261.6%Best
Hybrid (absolute + z-score)3661.3%Good
⚠️
Gap: Cross-Regional Generalization
Absolute VH values vary between regions due to soil type, topography, and rice varieties. Paddy fields in Sukabumi look different from those in Jogja even at the same growth stage. Differential features (how the signal changes) are more portable across regions. This means the current 29-feature model may underperform when applied to regions not in the training data.

Check Your Understanding

You're deploying this model to Sulawesi (a different island) where you have no training data. Based on the feature experiment results, which approach would you recommend?

The phenology detector uses fixed thresholds (e.g., flooding = below -2000). What's a potential problem with this?

04

The Neural Network Brain

Three model architectures compete to classify rice stages — from a simple multi-layer perceptron to a hybrid CNN-LSTM that processes time like a story.

Architecture 1: The Multi-Layer Perceptron

The base model is an MLP — a stack of dense layers that progressively compress the 29 input features into a 6-class prediction. Imagine a funnel: wide at the top (512 neurons), narrow at the bottom (6 output classes).

MLP Architecture: 29 Features → 6 Growth Stages
Input
29
Dense
512
+BN +LeakyReLU
Dropout 0.3
Dense
256
+BN +LeakyReLU
Dropout 0.2
Dense
128
+BN +LeakyReLU
Dropout 0.1
Dense
64
+BN +LeakyReLU
Temp
÷0.5
Softmax
6
💡
What's the Temperature Layer?
Temperature scaling (T=0.5) sharpens the model's confidence. Instead of saying "40% Stage 2, 35% Stage 3, 25% Stage 4," it pushes toward more decisive answers: "75% Stage 2, 15% Stage 3, 10% Stage 4." This makes the final map more definitive.

The Model in Code

CODE
model = Sequential([
    Input(shape=(input_shape,)),

    Dense(512, kernel_initializer='he_normal'),
    BatchNormalization(),
    LeakyReLU(negative_slope=0.1),
    Dropout(0.3),

    Dense(num_classes),
    TemperatureLayer(temperature=0.5),
    Activation('softmax')
])
PLAIN ENGLISH

Build a neural network where data flows through layers in sequence.

Accept 29 numbers as input (our features).

First layer: 512 neurons, initialized using a method optimized for ReLU activations.

Batch Norm stabilizes learning. LeakyReLU adds non-linearity. Dropout prevents overfitting by randomly disabling 30% of neurons during training.

Final layer: 6 neurons (one per growth stage). Temperature scaling sharpens the probabilities. Softmax converts the result into probabilities.

Source: utils.py lines 395-429

Architecture 3: CNN-LSTM — Reading Time Like a Story

The most advanced model treats the time series like a sentence. A CNN first scans for local patterns (like detecting a sudden jump or dip), then an LSTM reads the whole sequence to understand the narrative arc.

1
Split features into 3 branches
VH values, differences, and ratios each go through their own CNN-LSTM pathway. Each branch becomes a specialist.
2
CNN scans for local patterns
1D convolutions with kernel size 3 slide along the time axis, detecting patterns like "sharp rise" or "gradual decline" across 3 consecutive periods.
3
Bidirectional LSTM reads the story
The LSTM reads the sequence both forward and backward, capturing that "the signal rose after flooding" and "the signal was falling before harvest" simultaneously.
4
Merge and classify
All three branch outputs plus the phenology features are concatenated and passed through dense layers to produce the final 6-class prediction.

Check Your Understanding

The MLP uses progressively decreasing dropout rates: 0.3 → 0.2 → 0.1 → 0. Why might the first layer have the highest dropout?

Why does the CNN-LSTM use separate branches for VH values, differences, and ratios instead of feeding all 29 features together?

05

Training: Teaching the Model to See

How labeled field data becomes a trained classifier — including the thorny problem of unbalanced classes.

From CSV to Trained Model

Training starts with a CSV file containing field observations: for each point, the date it was visited, its GPS coordinates, and the growth stage a human observer recorded (1-6).

Training Pipeline Chat
0 / 6 messages

5-Fold Cross-Validation: Trust but Verify

Instead of training once and hoping for the best, the system uses 5-fold cross-validation. It's like giving the model 5 different exams to ensure it genuinely learned, not just memorized.

5-Fold Stratified Cross-Validation
Fold 1
TEST
TRAIN
Fold 2
TEST
Fold 3
TEST
Fold 4
TEST
Fold 5
TRAIN
TEST
"Stratified" means each fold maintains the same proportion of growth stages. The best fold's model is saved.

The Imbalance Problem

Not all growth stages are equally represented in the training data. Stage 2 (Early Vegetative) has 6.3× more samples than Stage 5 (Late Generative). This is like teaching someone to identify 6 bird species using 500 photos of pigeons but only 80 photos of eagles.

S1: 13.5%
S2: 37.6%
S3: 8.6%
S4: 16.9%
S5: 6.0%
S6: 17.4%

The model tends to "cheat" by defaulting to Stage 2 for ambiguous cases — it's right 37.6% of the time just by guessing! Three strategies combat this:

Class Weights
Penalize mistakes on rare stages more heavily. Getting Stage 5 wrong costs 6× more than Stage 2.
🔬
SMOTE
SMOTE generates synthetic samples for underrepresented stages by interpolating between existing examples.
🎯
Focal Loss
Focal loss makes the model focus on the examples it finds hardest, instead of optimizing for easy-to-classify ones.

Check Your Understanding

You notice the model gets 84% accuracy on Stage 2 but only 10% on Stage 3. What's the most likely root cause?

A colleague suggests training for 500 epochs instead of 150 to improve accuracy. What would likely happen?

06

Prediction: From Pixels to Maps

How the trained model processes millions of pixels efficiently and smooths the results into coherent growth stage maps.

Processing Millions of Pixels in Chunks

Java Island at 10-meter resolution has hundreds of millions of pixels. Loading all of them at once would overwhelm any computer's memory. The solution is chunked processing — the system processes 20,000 pixels at a time.

🗺
GeoTIFF
31 bands
Chunk
20K pixels
Feature
Extractor
🧠
Neural
Network
🗻
Spatial
Smoother
Click "Next Step" to trace the prediction pipeline
Step 0 / 6

Spatial Smoothing: Cleaning Up Noise

Raw pixel-wise predictions can be "noisy" — a single pixel might be classified as Stage 1 (Flooding) while everything around it is Stage 3 (Late Vegetative). That's probably an error. Median filtering fixes this by looking at each pixel's 3×3 neighborhood.

CODE
def apply_spatial_smoothing(predictions, kernel_size=3):
    valid_mask = (predictions != -32768)
    smoothed = predictions.copy()

    classes = np.unique(predictions[valid_mask])
    for class_val in classes:
        class_mask = (predictions == class_val)
        smoothed_mask = ndimage.median_filter(
            class_mask, size=kernel_size)
        smoothed[smoothed_mask & valid_mask] = class_val

    smoothed[~valid_mask] = -32768
    return smoothed
PLAIN ENGLISH

Take the raw prediction map. Identify which pixels are valid data (not ocean or empty).

Make a copy so we don't modify the original.

For each growth stage class (1 through 6):

Create a binary mask (1 = this class, 0 = other). Apply a 3×3 median filter to this mask. The filter removes isolated pixels that don't match their neighbors.

Update the smoothed map where the filtered mask says this class belongs.

Restore nodata values (-32768) for invalid pixels. Return the cleaned map.

Source: utils.py lines 503-515

The 10-50x Speed Optimization

The original code extracts features one pixel at a time using a Python loop. The optimized version uses vectorized NumPy operations to process all pixels simultaneously.

# Loop over each pixel one at a time
for idx in range(n_pixels):
    row, col = valid_indices[idx]
    vh_values = []
    for i in range(7):
        vh_values.append(s1_data[band-i, row, col])
    features[idx] = extract_features(vh_values)
# Time: ~500ms per million pixels
# Process ALL pixels simultaneously
vh_stack = s1_data[band_indices]
vh_values = vh_stack[:, rows, cols].T

vh_diff = vh_values[:, :-1] - vh_values[:, 1:]

denominator = np.abs(vh_values[:, 1:])
vh_ratio = vh_values[:, :-1] / denominator
# Time: ~10-50ms per million pixels
💡
Why the Speedup is So Dramatic
Python loops are slow because each iteration involves interpreter overhead. NumPy operations delegate to optimized C code that processes entire arrays using CPU SIMD instructions. The same mathematical operations happen, but instead of doing them one at a time in Python, they happen in bulk in compiled C — 10 to 50 times faster.

Temporal Filtering: Consistency Across Time

After spatial smoothing, an optional step enforces temporal consistency. If a pixel was classified as Stage 4 in both the previous two periods but suddenly jumped to Stage 1, that's likely an error. Temporal filtering uses confidence-weighted voting across 3 consecutive periods.

🔎
Gap: Temporal Filtering Performance
The current temporal filtering uses pixel-by-pixel Python loops (the slow approach). Unlike the feature extraction which was vectorized, this code still iterates over every pixel in a nested loop. Vectorizing it could yield another significant speedup for the post-processing step.

Check Your Understanding

The prediction pipeline processes 20,000 pixels per chunk. If you increase this to 200,000 pixels per chunk, what happens?

You're building a real-time monitoring dashboard that updates every 12 days. Which optimization would have the biggest impact?

07

Gaps, Bottlenecks & What's Next

The honest assessment: where the system struggles, why, and what improvements could close the gap.

The Accuracy Landscape

Overall accuracy sits around 64.6% — but that number hides dramatic per-class variation. Some stages are identified reliably; others are barely detected at all.

StageSamplesAccuracyStatus
1 — Flooding1,277 (13.5%)36-64%Variable
2 — Early Vegetative3,562 (37.6%)84%Strong
3 — Late Vegetative812 (8.6%)10-12%Critical
4 — Early Generative1,602 (16.9%)58-83%Variable
5 — Late Generative566 (6.0%)7-33%Critical
6 — Post-Harvest1,648 (17.4%)45-82%Variable

Root Cause: The Three-Stage Overlap

Stages 3, 4, and 5 have extremely similar VH values. The model can't distinguish between "plants growing quickly" (Stage 3), "plants at peak height" (Stage 4), and "plants starting to dry" (Stage 5) because the absolute backscatter values are nearly identical.

VH Values for Stages 3, 4, 5 — The Overlap Zone
Stage 3
-1814 to -1653
Stage 4
-1627 to -1533
Stage 5
-1621 to -1542
Massive overlap — nearly indistinguishable by absolute VH value alone
💡
The Key Difference is Rate of Change
While absolute values overlap, the slope differs: Stage 3 has a steep positive slope (rapid growth), Stage 4 has a near-zero slope (plateau at peak), and Stage 5 has a negative slope (declining after peak). Adding slope and curvature features could help the model distinguish these stages.

Optimization Roadmap

Based on the accuracy analysis, here are the highest-impact improvements ranked by effort and expected gain:

P1
Add Slope & Curvature Features High Impact, Low Effort
Compute the overall slope, second derivative (curvature), and acceleration of the VH curve. These directly address the Stage 3/4/5 overlap. Could add 3-6 new features to the existing 29.
P1
Focal Loss + Class Weights Combo Medium Impact, Low Effort
Switch loss function from standard cross-entropy to focal loss with per-class weighting. Forces model attention to minority stages without needing more training data.
P2
Hierarchical Classification High Impact, Medium Effort
Instead of predicting 6 classes at once, use a two-stage approach: first classify "vegetative vs generative vs other," then sub-classify within each group. Reduces the 6-way confusion problem to simpler 2-3 way decisions.
P3
Temporal Constraints Medium Impact, Medium Effort
Enforce biological rules: a pixel can't jump from Stage 1 to Stage 5 in one period. Add transition probabilities that penalize impossible growth stage sequences.
P3
Vectorize Temporal Filtering Speed, Low Effort
The temporal filtering post-processing still uses pixel-by-pixel Python loops. Vectorizing it (like the feature extraction) would yield another 10-50x speedup for this step.

Final Assessment

You have budget for one improvement. The government needs the map to correctly distinguish between "rice that's still growing" (Stages 2-4) and "rice ready for harvest" (Stage 5). Which optimization addresses this most directly?

A colleague proposes merging Stages 3, 4, and 5 into a single "Growth" stage, reducing to 4 classes. What's the trade-off?

Looking at the full system, what is the single biggest bottleneck limiting prediction quality right now?

You Made It!

You now understand how satellite radar, time series feature engineering, and neural networks combine to map rice growth stages across 3.5 million hectares of Java Island. From raw microwave echoes to 6-class growth stage maps — and the gaps that still need closing.

📚
Key Files to Explore
utils.py — Feature extraction
train.py — Training pipeline
predict_optimized.py — Fast prediction
config.py — Configuration
🚀
Quick Wins
1. Add slope/curvature features
2. Enable focal loss training
3. Vectorize temporal filtering
4. Try hierarchical classification