What Satellites See in a Rice Field

Radar signals bounce off rice paddies differently depending on the growth stage. That physical reality is what makes this entire system possible.

The Challenge: Monitoring 3.5 Million Hectares

Indonesia is the world's third-largest rice producer. Java Island alone has millions of hectares of paddy fields, and the government needs to know: how much rice is planted, how much is growing, and when will it be harvested?

Sending people to every field is impossible. Optical satellites (like regular cameras) are blocked by clouds — and Java is cloudy most of the year. The solution? SAR radar from the Sentinel-1 satellite.

🌌

Works through clouds

Radar penetrates cloud cover, unlike optical cameras. Critical for tropical Indonesia.

🌑

Works at night

The satellite provides its own illumination (microwave pulses), so it doesn't need sunlight.

📡

Senses structure

Radar echoes change based on surface texture — flooded fields vs. tall plants vs. bare soil all look different.

📅

Revisits every 12 days

Sentinel-1 captures the same area regularly, creating a time series we can analyze.

How Radar "Sees" Rice Growth

The satellite transmits a vertical (V) microwave pulse and records the horizontal (H) echo — called VH backscatter. The strength of this echo (measured in decibels) changes as rice grows:

VH Backscatter Through a Rice Growing Cycle

💡

The Core Insight

Rice growth follows a predictable radar signature: the signal starts weak (water surface reflects radar away), gets stronger as plants grow taller and denser (vegetation scatters radar back), then drops again after harvest. This U-shaped-then-inverted curve is the physical basis for the entire classification system.

The 6 Growth Stages

The system classifies every pixel into one of these stages. Think of them as chapters in a rice plant's biography:

💧

Stage 1: Flooding

Fields are deliberately flooded before planting. VH < -20 dB (very weak echo — smooth water surface).

🌱

Stage 2: Early Vegetative

Young seedlings emerge from water. VH -20 to -17 dB. Signal rising rapidly.

🌿

Stage 3: Late Vegetative

Plants grow tall and dense. VH -17 to -14 dB. Approaching peak biomass.

🌾

Stage 4: Early Generative

Rice heads form and fill with grain. VH -14 to -11 dB. Peak radar signal.

🌽

Stage 5: Late Generative

Grain matures, plants start drying. VH -15 to -13 dB. Signal starting to decline.

🍆

Stage 6: Post-Harvest

Rice is cut, field returns to bare soil or stubble. VH < -17 dB. Sharp signal drop.

Check Your Understanding

A farmer reports their field was just flooded for the new planting season. What would you expect the radar signal to look like?

Why does this system use SAR radar instead of optical satellite imagery (regular photos)?

You're adding a new crop type to this system (corn instead of rice). The corn lifecycle doesn't include a flooding stage. How would this change the radar signature?

02

The 12-Day Time Machine

How the system slices a full year into 31 periods and looks backward through time to understand each moment.

Slicing a Year into 31 Snapshots

Imagine a time-lapse camera that photographs every rice field on Java once every 12 days. That's essentially what Sentinel-1 does. The system divides each year into 31 periods of 12 days each:

12-Day Period Calendar (2024 Example)

P1
Jan 1-12

P2
Jan 13-24

P3
Jan 25-Feb 5

P4
Feb 6-17

P5
Feb 18-29

P6
Mar 1-12

P7
Mar 13-24

P8
Mar 25-Apr 5

...

P15
Jun 5-16

...

P23
Sep 7-18

...

P30
Dec 15-26

P31
Dec 27-31

📌

Why Period 7 is the First Valid Prediction

The model needs to look at 7 consecutive periods (current + 6 previous). Period 7 is the earliest that has 7 periods before it (P1 through P7). Trying to predict Period 3, for example, would only have 3 periods of history — not enough to see the growth pattern.

Looking Backward Through Time

When the system predicts the growth stage for Period 15, it doesn't just look at one snapshot. Like a doctor reviewing your medical history before making a diagnosis, it examines the current period plus the 6 preceding periods — 7 snapshots total.

P9

6 back

P10

5 back

P11

4 back

P12

3 back

P13

2 back

P14

1 back

P15

Current

Click "Next Step" to see how the backward window works

Step 0 / 4

How the Code Gets the Right Bands

The GeoTIFF file has 31 bands (one per period). The code uses 0-based indexing internally, meaning Band 1 = Index 0, Band 15 = Index 14.

CODE

for i in range(n_periods):
    band_idx = current_period - 1 - i

    if band_idx < 0:
        return None

    value = float(s1_data[band_idx, row_idx, col_idx])
    if np.isnan(value) or np.isinf(value):
        return None

    vh_values.append(value)

PLAIN ENGLISH

For each of the 7 time steps (current + 6 previous)...

Calculate which band to read: Period 15 on step 0 = Band 14, step 1 = Band 13, and so on backward.

If we'd go before the start of the data, this pixel can't be predicted — bail out.

Read the VH backscatter value at this specific pixel location and time step.

If the value is missing or corrupted (NaN/Inf), this pixel is unusable — bail out.

Add the valid value to our growing list of 7 time-series measurements.

Source: utils.py lines 199-212

Check Your Understanding

You want to predict Period 23. Which bands from the GeoTIFF stack does the model read?

The system currently uses 7 periods (84 days). If you wanted to track two full rice cycles instead of one, you'd need about 14 periods. What's the trade-off?

03

Turning Raw Signals into Features

The model doesn't just look at 7 raw numbers. It engineers 29 features per pixel — like a detective building a case from clues.

The 29 Features: 4 Categories

Think of feature engineering as giving the model different lenses to examine the same data. Each category reveals something different about the rice growth pattern:

📈

VH Time Series 7 features

The raw backscatter values at each of the 7 time steps. Like reading a thermometer at 7 different times — gives you the absolute "temperature" of the field.

∆

Temporal Differences 6 features

The change between consecutive periods. Is the signal going up (growing), down (harvesting), or flat (mature)? Like measuring how much taller a child grew each month.

÷

Temporal Ratios 6 features

The proportional change between periods. A jump from -20 to -17 dB means something different than -14 to -11 dB, even though both are +3 dB.

🌱

Phenology + Extrema 10 features

Domain-specific detectors: has flooding occurred? Where's the min/max? These encode expert knowledge about rice growth patterns directly.

Building the Differences and Ratios

After extracting the 7 raw VH values, the code computes how the signal changes between each pair of consecutive periods:

CODE

# Temporal differences
for i in range(len(vh_values)-1):
    diff = vh_values[i] - vh_values[i+1]
    features.append(diff)

# Temporal ratios with safe division
for i in range(len(vh_values)-1):
    denominator = abs(vh_values[i+1])
    if denominator < 1e-10:
        ratio = 0.0
    else:
        ratio = vh_values[i] / denominator
    features.append(ratio)

PLAIN ENGLISH

Differences — for each consecutive pair of time steps...

Subtract the older value from the newer one. Positive = signal getting stronger (growing). Negative = signal getting weaker (harvesting).

Add this "speed of change" number to our feature list.

Ratios — for the same pairs...

Get the absolute value of the older measurement. If it's basically zero (which would crash the division), use a safe default of 0.

Otherwise, divide current by previous to get the proportional change. A ratio > 1 means growth, < 1 means decline.

Source: utils.py lines 220-236

Phenology Detection: Expert Knowledge in Code

The most interesting features come from phenology detection — where domain expertise about rice growth is encoded as rules. The code looks at the VH values and asks: "Does this pattern look like flooding? Like peak biomass? Like a harvest?"

CODE

# Flooding: very low VH signal
if vh_smooth[vh_min_idx] < -2000:
    phenology['flooding_detected'] = True

# Early Vegetative: rising after flood
if phenology['flooding_detected'] \
   and np.mean(recent_slopes) > 200:
    if -2000 <= current_vh <= -1700:
        phenology['early_vegetative'] = True

# Post-harvest: sharp decline
if slopes[-1] < -200:
    phenology['post_harvest'] = True

PLAIN ENGLISH

Flooding check: If the lowest signal in the window is below -20 dB (scaled ×100 = -2000), mark "flooding detected." Smooth water = extremely weak radar echo.

Early vegetative check: If we already detected flooding AND the signal is rising steeply (slope > 200) AND the current value is in the -20 to -17 dB range — young rice is emerging from water.

Post-harvest check: If the most recent slope is sharply negative (dropping fast), the rice was likely just cut down.

Source: utils.py lines 316-353

💡

Domain Knowledge is a Feature

These threshold-based detectors encode decades of agronomic research into simple yes/no signals. The neural network doesn't need to "rediscover" that flooding looks a certain way — we tell it directly. This is a powerful pattern in ML: combining data-driven learning with expert knowledge gives better results than either approach alone.

Research Finding: Which Features Matter Most?

An experiment tested whether absolute VH values or differential features (differences/ratios) contribute more to accuracy. The results were surprising:

Approach	Features	Accuracy	Verdict
Baseline (all 29)	29	59.8%	Baseline
Z-score normalized	29	53.5%	Worse
Differential only	22	61.6%	Best
Hybrid (absolute + z-score)	36	61.3%	Good

⚠️

Gap: Cross-Regional Generalization

Absolute VH values vary between regions due to soil type, topography, and rice varieties. Paddy fields in Sukabumi look different from those in Jogja even at the same growth stage. Differential features (how the signal changes) are more portable across regions. This means the current 29-feature model may underperform when applied to regions not in the training data.

Check Your Understanding

You're deploying this model to Sulawesi (a different island) where you have no training data. Based on the feature experiment results, which approach would you recommend?

The phenology detector uses fixed thresholds (e.g., flooding = below -2000). What's a potential problem with this?

04

The Neural Network Brain

Three model architectures compete to classify rice stages — from a simple multi-layer perceptron to a hybrid CNN-LSTM that processes time like a story.

Architecture 1: The Multi-Layer Perceptron

The base model is an MLP — a stack of dense layers that progressively compress the 29 input features into a 6-class prediction. Imagine a funnel: wide at the top (512 neurons), narrow at the bottom (6 output classes).

MLP Architecture: 29 Features → 6 Growth Stages

Input
29

→

Dense
512

+BN +LeakyReLU
Dropout 0.3

→

Dense
256

+BN +LeakyReLU
Dropout 0.2

→

Dense
128

+BN +LeakyReLU
Dropout 0.1

→

Dense
64

+BN +LeakyReLU

→

Temp
÷0.5

→

Softmax
6

💡

What's the Temperature Layer?

Temperature scaling (T=0.5) sharpens the model's confidence. Instead of saying "40% Stage 2, 35% Stage 3, 25% Stage 4," it pushes toward more decisive answers: "75% Stage 2, 15% Stage 3, 10% Stage 4." This makes the final map more definitive.

The Model in Code

CODE

model = Sequential([
    Input(shape=(input_shape,)),

    Dense(512, kernel_initializer='he_normal'),
    BatchNormalization(),
    LeakyReLU(negative_slope=0.1),
    Dropout(0.3),

    Dense(num_classes),
    TemperatureLayer(temperature=0.5),
    Activation('softmax')
])

PLAIN ENGLISH

Build a neural network where data flows through layers in sequence.

Accept 29 numbers as input (our features).

First layer: 512 neurons, initialized using a method optimized for ReLU activations.

Batch Norm stabilizes learning. LeakyReLU adds non-linearity. Dropout prevents overfitting by randomly disabling 30% of neurons during training.

Final layer: 6 neurons (one per growth stage). Temperature scaling sharpens the probabilities. Softmax converts the result into probabilities.

Source: utils.py lines 395-429

Architecture 3: CNN-LSTM — Reading Time Like a Story

The most advanced model treats the time series like a sentence. A CNN first scans for local patterns (like detecting a sudden jump or dip), then an LSTM reads the whole sequence to understand the narrative arc.

1

Split features into 3 branches

VH values, differences, and ratios each go through their own CNN-LSTM pathway. Each branch becomes a specialist.

2

CNN scans for local patterns

1D convolutions with kernel size 3 slide along the time axis, detecting patterns like "sharp rise" or "gradual decline" across 3 consecutive periods.

3

Bidirectional LSTM reads the story

The LSTM reads the sequence both forward and backward, capturing that "the signal rose after flooding" and "the signal was falling before harvest" simultaneously.

4

Merge and classify

All three branch outputs plus the phenology features are concatenated and passed through dense layers to produce the final 6-class prediction.

Check Your Understanding

The MLP uses progressively decreasing dropout rates: 0.3 → 0.2 → 0.1 → 0. Why might the first layer have the highest dropout?

Why does the CNN-LSTM use separate branches for VH values, differences, and ratios instead of feeding all 29 features together?

05

Training: Teaching the Model to See

How labeled field data becomes a trained classifier — including the thorny problem of unbalanced classes.

From CSV to Trained Model

Training starts with a CSV file containing field observations: for each point, the date it was visited, its GPS coordinates, and the growth stage a human observer recorded (1-6).

Training Pipeline Chat

0 / 6 messages

5-Fold Cross-Validation: Trust but Verify

Instead of training once and hoping for the best, the system uses 5-fold cross-validation. It's like giving the model 5 different exams to ensure it genuinely learned, not just memorized.

5-Fold Stratified Cross-Validation

Fold 1

TEST

TRAIN

Fold 2

TEST

Fold 3

TEST

Fold 4

TEST

Fold 5

TRAIN

TEST

"Stratified" means each fold maintains the same proportion of growth stages. The best fold's model is saved.

The Imbalance Problem

Not all growth stages are equally represented in the training data. Stage 2 (Early Vegetative) has 6.3× more samples than Stage 5 (Late Generative). This is like teaching someone to identify 6 bird species using 500 photos of pigeons but only 80 photos of eagles.

S1: 13.5%

S2: 37.6%

S3: 8.6%

S4: 16.9%

S5: 6.0%

S6: 17.4%

The model tends to "cheat" by defaulting to Stage 2 for ambiguous cases — it's right 37.6% of the time just by guessing! Three strategies combat this:

⚖

Class Weights

Penalize mistakes on rare stages more heavily. Getting Stage 5 wrong costs 6× more than Stage 2.

🔬

SMOTE

SMOTE generates synthetic samples for underrepresented stages by interpolating between existing examples.

🎯

Focal Loss

Focal loss makes the model focus on the examples it finds hardest, instead of optimizing for easy-to-classify ones.

Check Your Understanding

You notice the model gets 84% accuracy on Stage 2 but only 10% on Stage 3. What's the most likely root cause?

A colleague suggests training for 500 epochs instead of 150 to improve accuracy. What would likely happen?

06

Prediction: From Pixels to Maps

How the trained model processes millions of pixels efficiently and smooths the results into coherent growth stage maps.

Processing Millions of Pixels in Chunks

Java Island at 10-meter resolution has hundreds of millions of pixels. Loading all of them at once would overwhelm any computer's memory. The solution is chunked processing — the system processes 20,000 pixels at a time.

🗺

GeoTIFF
31 bands

✂

Chunk
20K pixels

⚙

Feature
Extractor

🧠

Neural
Network

🗻

Spatial
Smoother

Click "Next Step" to trace the prediction pipeline

Step 0 / 6

Spatial Smoothing: Cleaning Up Noise

Raw pixel-wise predictions can be "noisy" — a single pixel might be classified as Stage 1 (Flooding) while everything around it is Stage 3 (Late Vegetative). That's probably an error. Median filtering fixes this by looking at each pixel's 3×3 neighborhood.

CODE

def apply_spatial_smoothing(predictions, kernel_size=3):
    valid_mask = (predictions != -32768)
    smoothed = predictions.copy()

    classes = np.unique(predictions[valid_mask])
    for class_val in classes:
        class_mask = (predictions == class_val)
        smoothed_mask = ndimage.median_filter(
            class_mask, size=kernel_size)
        smoothed[smoothed_mask & valid_mask] = class_val

    smoothed[~valid_mask] = -32768
    return smoothed

PLAIN ENGLISH

Take the raw prediction map. Identify which pixels are valid data (not ocean or empty).

Make a copy so we don't modify the original.

For each growth stage class (1 through 6):

Create a binary mask (1 = this class, 0 = other). Apply a 3×3 median filter to this mask. The filter removes isolated pixels that don't match their neighbors.

Update the smoothed map where the filtered mask says this class belongs.

Restore nodata values (-32768) for invalid pixels. Return the cleaned map.

Source: utils.py lines 503-515

The 10-50x Speed Optimization

The original code extracts features one pixel at a time using a Python loop. The optimized version uses vectorized NumPy operations to process all pixels simultaneously.

# Loop over each pixel one at a time
for idx in range(n_pixels):
    row, col = valid_indices[idx]
    vh_values = []
    for i in range(7):
        vh_values.append(s1_data[band-i, row, col])
    features[idx] = extract_features(vh_values)
# Time: ~500ms per million pixels

# Process ALL pixels simultaneously
vh_stack = s1_data[band_indices]
vh_values = vh_stack[:, rows, cols].T

vh_diff = vh_values[:, :-1] - vh_values[:, 1:]

denominator = np.abs(vh_values[:, 1:])
vh_ratio = vh_values[:, :-1] / denominator
# Time: ~10-50ms per million pixels

💡

Why the Speedup is So Dramatic

Python loops are slow because each iteration involves interpreter overhead. NumPy operations delegate to optimized C code that processes entire arrays using CPU SIMD instructions. The same mathematical operations happen, but instead of doing them one at a time in Python, they happen in bulk in compiled C — 10 to 50 times faster.

Temporal Filtering: Consistency Across Time

After spatial smoothing, an optional step enforces temporal consistency. If a pixel was classified as Stage 4 in both the previous two periods but suddenly jumped to Stage 1, that's likely an error. Temporal filtering uses confidence-weighted voting across 3 consecutive periods.

🔎

Gap: Temporal Filtering Performance

The current temporal filtering uses pixel-by-pixel Python loops (the slow approach). Unlike the feature extraction which was vectorized, this code still iterates over every pixel in a nested loop. Vectorizing it could yield another significant speedup for the post-processing step.

Check Your Understanding

The prediction pipeline processes 20,000 pixels per chunk. If you increase this to 200,000 pixels per chunk, what happens?

You're building a real-time monitoring dashboard that updates every 12 days. Which optimization would have the biggest impact?

07

Gaps, Bottlenecks & What's Next

The honest assessment: where the system struggles, why, and what improvements could close the gap.

The Accuracy Landscape

Overall accuracy sits around 64.6% — but that number hides dramatic per-class variation. Some stages are identified reliably; others are barely detected at all.

Stage	Samples	Accuracy	Status
1 — Flooding	1,277 (13.5%)	36-64%	Variable
2 — Early Vegetative	3,562 (37.6%)	84%	Strong
3 — Late Vegetative	812 (8.6%)	10-12%	Critical
4 — Early Generative	1,602 (16.9%)	58-83%	Variable
5 — Late Generative	566 (6.0%)	7-33%	Critical
6 — Post-Harvest	1,648 (17.4%)	45-82%	Variable

Root Cause: The Three-Stage Overlap

Stages 3, 4, and 5 have extremely similar VH values. The model can't distinguish between "plants growing quickly" (Stage 3), "plants at peak height" (Stage 4), and "plants starting to dry" (Stage 5) because the absolute backscatter values are nearly identical.

VH Values for Stages 3, 4, 5 — The Overlap Zone

Stage 3

-1814 to -1653

Stage 4

-1627 to -1533

Stage 5

-1621 to -1542

Massive overlap — nearly indistinguishable by absolute VH value alone

💡

The Key Difference is Rate of Change

While absolute values overlap, the slope differs: Stage 3 has a steep positive slope (rapid growth), Stage 4 has a near-zero slope (plateau at peak), and Stage 5 has a negative slope (declining after peak). Adding slope and curvature features could help the model distinguish these stages.

Optimization Roadmap

Based on the accuracy analysis, here are the highest-impact improvements ranked by effort and expected gain:

P1

Add Slope & Curvature Features High Impact, Low Effort

Compute the overall slope, second derivative (curvature), and acceleration of the VH curve. These directly address the Stage 3/4/5 overlap. Could add 3-6 new features to the existing 29.

P1

Focal Loss + Class Weights Combo Medium Impact, Low Effort

Switch loss function from standard cross-entropy to focal loss with per-class weighting. Forces model attention to minority stages without needing more training data.

P2

Hierarchical Classification High Impact, Medium Effort

Instead of predicting 6 classes at once, use a two-stage approach: first classify "vegetative vs generative vs other," then sub-classify within each group. Reduces the 6-way confusion problem to simpler 2-3 way decisions.

P3

Temporal Constraints Medium Impact, Medium Effort

Enforce biological rules: a pixel can't jump from Stage 1 to Stage 5 in one period. Add transition probabilities that penalize impossible growth stage sequences.

P3

Vectorize Temporal Filtering Speed, Low Effort

The temporal filtering post-processing still uses pixel-by-pixel Python loops. Vectorizing it (like the feature extraction) would yield another 10-50x speedup for this step.

Final Assessment

You have budget for one improvement. The government needs the map to correctly distinguish between "rice that's still growing" (Stages 2-4) and "rice ready for harvest" (Stage 5). Which optimization addresses this most directly?

A colleague proposes merging Stages 3, 4, and 5 into a single "Growth" stage, reducing to 4 classes. What's the trade-off?

Looking at the full system, what is the single biggest bottleneck limiting prediction quality right now?

You Made It!

You now understand how satellite radar, time series feature engineering, and neural networks combine to map rice growth stages across 3.5 million hectares of Java Island. From raw microwave echoes to 6-class growth stage maps — and the gaps that still need closing.

📚

Key Files to Explore

utils.py — Feature extraction
train.py — Training pipeline
predict_optimized.py — Fast prediction
config.py — Configuration

🚀

Quick Wins

1. Add slope/curvature features
2. Enable focal loss training
3. Vectorize temporal filtering
4. Try hierarchical classification

Mapping Rice Growth from Space

The Challenge: Monitoring 3.5 Million Hectares

How Radar "Sees" Rice Growth

The 6 Growth Stages

Check Your Understanding

A farmer reports their field was just flooded for the new planting season. What would you expect the radar signal to look like?

Why does this system use SAR radar instead of optical satellite imagery (regular photos)?

You're adding a new crop type to this system (corn instead of rice). The corn lifecycle doesn't include a flooding stage. How would this change the radar signature?

Slicing a Year into 31 Snapshots

Looking Backward Through Time

How the Code Gets the Right Bands

Check Your Understanding

You want to predict Period 23. Which bands from the GeoTIFF stack does the model read?

The system currently uses 7 periods (84 days). If you wanted to track two full rice cycles instead of one, you'd need about 14 periods. What's the trade-off?

The 29 Features: 4 Categories

Building the Differences and Ratios

Phenology Detection: Expert Knowledge in Code

Research Finding: Which Features Matter Most?

Check Your Understanding

You're deploying this model to Sulawesi (a different island) where you have no training data. Based on the feature experiment results, which approach would you recommend?

The phenology detector uses fixed thresholds (e.g., flooding = below -2000). What's a potential problem with this?

Architecture 1: The Multi-Layer Perceptron

The Model in Code

Architecture 3: CNN-LSTM — Reading Time Like a Story

Check Your Understanding

The MLP uses progressively decreasing dropout rates: 0.3 → 0.2 → 0.1 → 0. Why might the first layer have the highest dropout?

Why does the CNN-LSTM use separate branches for VH values, differences, and ratios instead of feeding all 29 features together?

From CSV to Trained Model

5-Fold Cross-Validation: Trust but Verify

The Imbalance Problem

Check Your Understanding

You notice the model gets 84% accuracy on Stage 2 but only 10% on Stage 3. What's the most likely root cause?

A colleague suggests training for 500 epochs instead of 150 to improve accuracy. What would likely happen?

Processing Millions of Pixels in Chunks

Spatial Smoothing: Cleaning Up Noise

The 10-50x Speed Optimization

Temporal Filtering: Consistency Across Time

Check Your Understanding

The prediction pipeline processes 20,000 pixels per chunk. If you increase this to 200,000 pixels per chunk, what happens?

You're building a real-time monitoring dashboard that updates every 12 days. Which optimization would have the biggest impact?

The Accuracy Landscape

Root Cause: The Three-Stage Overlap

Optimization Roadmap

Final Assessment

You have budget for one improvement. The government needs the map to correctly distinguish between "rice that's still growing" (Stages 2-4) and "rice ready for harvest" (Stage 5). Which optimization addresses this most directly?

A colleague proposes merging Stages 3, 4, and 5 into a single "Growth" stage, reducing to 4 classes. What's the trade-off?

Looking at the full system, what is the single biggest bottleneck limiting prediction quality right now?

You Made It!