How satellite radar, time series analysis, and neural networks predict the growth stage of every paddy field across Java Island.
Start LearningRadar signals bounce off rice paddies differently depending on the growth stage. That physical reality is what makes this entire system possible.
Indonesia is the world's third-largest rice producer. Java Island alone has millions of hectares of paddy fields, and the government needs to know: how much rice is planted, how much is growing, and when will it be harvested?
Sending people to every field is impossible. Optical satellites (like regular cameras) are blocked by clouds — and Java is cloudy most of the year. The solution? SAR radar from the Sentinel-1 satellite.
The satellite transmits a vertical (V) microwave pulse and records the horizontal (H) echo — called VH backscatter. The strength of this echo (measured in decibels) changes as rice grows:
The system classifies every pixel into one of these stages. Think of them as chapters in a rice plant's biography:
How the system slices a full year into 31 periods and looks backward through time to understand each moment.
Imagine a time-lapse camera that photographs every rice field on Java once every 12 days. That's essentially what Sentinel-1 does. The system divides each year into 31 periods of 12 days each:
When the system predicts the growth stage for Period 15, it doesn't just look at one snapshot. Like a doctor reviewing your medical history before making a diagnosis, it examines the current period plus the 6 preceding periods — 7 snapshots total.
The GeoTIFF file has 31 bands (one per period). The code uses 0-based indexing internally, meaning Band 1 = Index 0, Band 15 = Index 14.
for i in range(n_periods):
band_idx = current_period - 1 - i
if band_idx < 0:
return None
value = float(s1_data[band_idx, row_idx, col_idx])
if np.isnan(value) or np.isinf(value):
return None
vh_values.append(value)
For each of the 7 time steps (current + 6 previous)...
Calculate which band to read: Period 15 on step 0 = Band 14, step 1 = Band 13, and so on backward.
If we'd go before the start of the data, this pixel can't be predicted — bail out.
Read the VH backscatter value at this specific pixel location and time step.
If the value is missing or corrupted (NaN/Inf), this pixel is unusable — bail out.
Add the valid value to our growing list of 7 time-series measurements.
Source: utils.py lines 199-212
The model doesn't just look at 7 raw numbers. It engineers 29 features per pixel — like a detective building a case from clues.
Think of feature engineering as giving the model different lenses to examine the same data. Each category reveals something different about the rice growth pattern:
After extracting the 7 raw VH values, the code computes how the signal changes between each pair of consecutive periods:
# Temporal differences
for i in range(len(vh_values)-1):
diff = vh_values[i] - vh_values[i+1]
features.append(diff)
# Temporal ratios with safe division
for i in range(len(vh_values)-1):
denominator = abs(vh_values[i+1])
if denominator < 1e-10:
ratio = 0.0
else:
ratio = vh_values[i] / denominator
features.append(ratio)
Differences — for each consecutive pair of time steps...
Subtract the older value from the newer one. Positive = signal getting stronger (growing). Negative = signal getting weaker (harvesting).
Add this "speed of change" number to our feature list.
Ratios — for the same pairs...
Get the absolute value of the older measurement. If it's basically zero (which would crash the division), use a safe default of 0.
Otherwise, divide current by previous to get the proportional change. A ratio > 1 means growth, < 1 means decline.
Source: utils.py lines 220-236
The most interesting features come from phenology detection — where domain expertise about rice growth is encoded as rules. The code looks at the VH values and asks: "Does this pattern look like flooding? Like peak biomass? Like a harvest?"
# Flooding: very low VH signal
if vh_smooth[vh_min_idx] < -2000:
phenology['flooding_detected'] = True
# Early Vegetative: rising after flood
if phenology['flooding_detected'] \
and np.mean(recent_slopes) > 200:
if -2000 <= current_vh <= -1700:
phenology['early_vegetative'] = True
# Post-harvest: sharp decline
if slopes[-1] < -200:
phenology['post_harvest'] = True
Flooding check: If the lowest signal in the window is below -20 dB (scaled ×100 = -2000), mark "flooding detected." Smooth water = extremely weak radar echo.
Early vegetative check: If we already detected flooding AND the signal is rising steeply (slope > 200) AND the current value is in the -20 to -17 dB range — young rice is emerging from water.
Post-harvest check: If the most recent slope is sharply negative (dropping fast), the rice was likely just cut down.
Source: utils.py lines 316-353
An experiment tested whether absolute VH values or differential features (differences/ratios) contribute more to accuracy. The results were surprising:
| Approach | Features | Accuracy | Verdict |
|---|---|---|---|
| Baseline (all 29) | 29 | 59.8% | Baseline |
| Z-score normalized | 29 | 53.5% | Worse |
| Differential only | 22 | 61.6% | Best |
| Hybrid (absolute + z-score) | 36 | 61.3% | Good |
Three model architectures compete to classify rice stages — from a simple multi-layer perceptron to a hybrid CNN-LSTM that processes time like a story.
The base model is an MLP — a stack of dense layers that progressively compress the 29 input features into a 6-class prediction. Imagine a funnel: wide at the top (512 neurons), narrow at the bottom (6 output classes).
model = Sequential([
Input(shape=(input_shape,)),
Dense(512, kernel_initializer='he_normal'),
BatchNormalization(),
LeakyReLU(negative_slope=0.1),
Dropout(0.3),
Dense(num_classes),
TemperatureLayer(temperature=0.5),
Activation('softmax')
])
Build a neural network where data flows through layers in sequence.
Accept 29 numbers as input (our features).
First layer: 512 neurons, initialized using a method optimized for ReLU activations.
Batch Norm stabilizes learning. LeakyReLU adds non-linearity. Dropout prevents overfitting by randomly disabling 30% of neurons during training.
Final layer: 6 neurons (one per growth stage). Temperature scaling sharpens the probabilities. Softmax converts the result into probabilities.
Source: utils.py lines 395-429
The most advanced model treats the time series like a sentence. A CNN first scans for local patterns (like detecting a sudden jump or dip), then an LSTM reads the whole sequence to understand the narrative arc.
How labeled field data becomes a trained classifier — including the thorny problem of unbalanced classes.
Training starts with a CSV file containing field observations: for each point, the date it was visited, its GPS coordinates, and the growth stage a human observer recorded (1-6).
Instead of training once and hoping for the best, the system uses 5-fold cross-validation. It's like giving the model 5 different exams to ensure it genuinely learned, not just memorized.
Not all growth stages are equally represented in the training data. Stage 2 (Early Vegetative) has 6.3× more samples than Stage 5 (Late Generative). This is like teaching someone to identify 6 bird species using 500 photos of pigeons but only 80 photos of eagles.
The model tends to "cheat" by defaulting to Stage 2 for ambiguous cases — it's right 37.6% of the time just by guessing! Three strategies combat this:
How the trained model processes millions of pixels efficiently and smooths the results into coherent growth stage maps.
Java Island at 10-meter resolution has hundreds of millions of pixels. Loading all of them at once would overwhelm any computer's memory. The solution is chunked processing — the system processes 20,000 pixels at a time.
Raw pixel-wise predictions can be "noisy" — a single pixel might be classified as Stage 1 (Flooding) while everything around it is Stage 3 (Late Vegetative). That's probably an error. Median filtering fixes this by looking at each pixel's 3×3 neighborhood.
def apply_spatial_smoothing(predictions, kernel_size=3):
valid_mask = (predictions != -32768)
smoothed = predictions.copy()
classes = np.unique(predictions[valid_mask])
for class_val in classes:
class_mask = (predictions == class_val)
smoothed_mask = ndimage.median_filter(
class_mask, size=kernel_size)
smoothed[smoothed_mask & valid_mask] = class_val
smoothed[~valid_mask] = -32768
return smoothed
Take the raw prediction map. Identify which pixels are valid data (not ocean or empty).
Make a copy so we don't modify the original.
For each growth stage class (1 through 6):
Create a binary mask (1 = this class, 0 = other). Apply a 3×3 median filter to this mask. The filter removes isolated pixels that don't match their neighbors.
Update the smoothed map where the filtered mask says this class belongs.
Restore nodata values (-32768) for invalid pixels. Return the cleaned map.
Source: utils.py lines 503-515
The original code extracts features one pixel at a time using a Python loop. The optimized version uses vectorized NumPy operations to process all pixels simultaneously.
After spatial smoothing, an optional step enforces temporal consistency. If a pixel was classified as Stage 4 in both the previous two periods but suddenly jumped to Stage 1, that's likely an error. Temporal filtering uses confidence-weighted voting across 3 consecutive periods.
The honest assessment: where the system struggles, why, and what improvements could close the gap.
Overall accuracy sits around 64.6% — but that number hides dramatic per-class variation. Some stages are identified reliably; others are barely detected at all.
| Stage | Samples | Accuracy | Status |
|---|---|---|---|
| 1 — Flooding | 1,277 (13.5%) | 36-64% | Variable |
| 2 — Early Vegetative | 3,562 (37.6%) | 84% | Strong |
| 3 — Late Vegetative | 812 (8.6%) | 10-12% | Critical |
| 4 — Early Generative | 1,602 (16.9%) | 58-83% | Variable |
| 5 — Late Generative | 566 (6.0%) | 7-33% | Critical |
| 6 — Post-Harvest | 1,648 (17.4%) | 45-82% | Variable |
Stages 3, 4, and 5 have extremely similar VH values. The model can't distinguish between "plants growing quickly" (Stage 3), "plants at peak height" (Stage 4), and "plants starting to dry" (Stage 5) because the absolute backscatter values are nearly identical.
Based on the accuracy analysis, here are the highest-impact improvements ranked by effort and expected gain:
You now understand how satellite radar, time series feature engineering, and neural networks combine to map rice growth stages across 3.5 million hectares of Java Island. From raw microwave echoes to 6-class growth stage maps — and the gaps that still need closing.