Technical Framework

Computational Methodology & Pipeline Architecture

Integrating multi-temporal satellite sensors and proximity-based features into a unified 1 km analytical grid for high-resolution risk prediction.

01. Multi-Source Data Integration

Our pipeline leverages the computational power of Google Earth Engine to harmonize disparate geospatial datasets. The core challenge addressed is the alignment of heterogeneous spatial and temporal resolutions—ranging from 10m Sentinel-2 bands to 30m SRTM terrain data—into a standardized 1 km resolution analytical master table.

Feature Engineering Matrix

SpectralNDVI, NBR, B2-B12 Bands
DisturbanceGFW Integrated Alerts (Count/Density)
AnthropogenicEuclidean distance to road networks
EnvironmentalSlope, Elevation, Mean Rainfall

The 1 km Analytical Grid

To ensure scientific rigor and policy relevance, we resample all input rasters to a unified 1 km² resolution. This grid size acts as a spatial filter, reducing the impact of satellite sensor noise and localized cloud interference while maintaining sufficient resolution for district-level prioritization.

GEE Pre-processingSpatial Join

02. Statistical Modeling & Interpretability

Algorithm Performance Hierarchy

Random Forest (RF)Best Precision

Captures non-linear dependencies between topography and road-distance pressures. Validated at 0.89 AUC.

Logistic Regression (LogReg)Interpretable

Utilized for coefficient analysis to determine the relative contribution of individual risk drivers.

Modeling is performed using a binary classification approach, where the target label is derived from the Hansen Global Forest Change "lossyear" dataset. We train the models to predict the probability of forest loss occurring within each grid cell based on its unique environmental and accessibility profile.

AUC-ROC Optimization

Models are tuned to maximize the Area Under the Receiver Operating Characteristic curve, ensuring robust separation between risk classes.

Driver Significance

Gini importance (RF) and normalized coefficients (LogReg) are compared to validate the biological and physical logic of the model.

03. Validation Protocol

Historical Hold-out

Data is split into temporal training and testing sets. Models trained on historical data are tested against recent forest loss labels to evaluate real-world predictive validity.

Spatial Transferability

To test regional robustness, we conduct "cross-district" validation, where models trained in K’Bang are evaluated on the Mang Yang landscape (and vice versa).

Capture at 10%

We calculate the "Capture Rate" within the top 10% highest-risk predicted areas to assess the operational efficiency of the resulting warning zones.